Compare commits

..

511 Commits

Author SHA1 Message Date
github-actions[bot]
5fb521061d flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/608d0ca' (2026-03-08)
  → 'github:NixOS/nixpkgs/8d8c1fa' (2026-04-02)
2026-04-05 00:36:57 +00:00
Florian Preinstorfer
23a5f1b628 Use pymdownx.magiclink with its default configuration
The docs contain bare links that are not rendered without it.
2026-04-02 21:24:27 +02:00
Florian Preinstorfer
44600550c6 Fix invisible selected menu item
A light background with white primary font makes the selected menu entry
unreadable.
2026-04-02 21:24:27 +02:00
Kristoffer Dalby
835db974b5 testdata: strip unused fields from all test data files (23MB -> 4MB)
Strip fields not consumed by any test from all 594 HuJSON test data files:

grant_results/ (248 files, 21MB -> 1.8MB):
  - Remove: timestamp, propagation_wait_seconds, input.policy_file,
    input.grants_section, input.api_endpoint, input.api_method,
    topology.nodes.mts_name, topology.nodes.socket, topology.nodes.user_id,
    captures.commands, captures.packet_filter_matches, captures.whois
  - V14-V16, V26-V36: keep stripped netmap (Peers.Name/AllowedIPs/PrimaryRoutes
    + PacketFilterRules) for via_compat_test.go compatibility
  - V17-V25: strip netmap (old topology, incompatible with via_compat harness)

acl_results/ (215 files, 1.4MB -> 1.2MB):
  - Remove: timestamp, propagation_wait_seconds, input.policy_file,
    input.api_endpoint, input.api_response_code, entire topology section
    (parsed by Go struct but completely ignored — nodes are hardcoded)

routes_results/ (92 files, unchanged — topology is actively used):
  - Remove: timestamp, propagation_wait_seconds, input.policy_file,
    input.api_endpoint, input.api_response_code

ssh_results/ (39 files, unchanged — minimal to begin with):
  - Remove: policy_file
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
30dce30a9d testdata: convert .json to .hujson with header comments
Rename all 594 test data files from .json to .hujson and add
descriptive header comments to each file documenting what policy
rules are under test and what outcome is expected.

Update test loaders in all 5 _test.go files to parse HuJSON via
hujson.Parse/Standardize/Pack before json.Unmarshal.

Add cross-dependency warning to via_compat_test.go documenting
that GRANT-V29/V30/V31/V36 are shared with TestGrantsCompat.

Add .gitignore exemption for testdata HuJSON files.
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
f693cc0851 CHANGELOG: document grants support for 0.29.0
Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
abd2b15db5 policy/v2: clean up dead error variables, stale TODO, and test skip reasons
Remove unused error variables (ErrGrantViaNotSupported, ErrGrantEmptySources, ErrGrantEmptyDestinations, ErrGrantViaOnlyTag) and the stale TODO for via implementation. Update compat test skip reasons to reflect that user:*@passkey wildcard is a known unsupported feature, not a pending implementation.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
b762e4c350 integration: remove exit node via grant tests
Remove TestGrantViaExitNodeSteering and TestGrantViaMixedSteering.
Exit node traffic forwarding through via grants cannot be validated
with curl/traceroute in Docker containers because Tailscale exit nodes
strip locally-connected subnets from their forwarding filter.

The correctness of via exit steering is validated by:
- Golden MapResponse comparison (TestViaGrantMapCompat with GRANT-V31
  and GRANT-V36) comparing full netmap output against Tailscale SaaS
- Filter rule compatibility (TestGrantsCompat with GRANT-V14 through
  GRANT-V36) comparing per-node PacketFilter rules against Tailscale SaaS
- TestGrantViaSubnetSteering (kept) validates via subnet steering with
  actual curl/traceroute through Docker, which works for subnet routes

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
c36cedc32f policy/v2: fix via grants in BuildPeerMap, MatchersForNode, and ViaRoutesForPeer
Use per-node compilation path for via grants in BuildPeerMap and MatchersForNode to ensure via-granted nodes appear in peer maps. Fix ViaRoutesForPeer golden test route inference to correctly resolve via grant effects.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
6a55f7d731 policy/v2: add via exit steering golden captures and tests
Add golden test data for via exit route steering and fix via exit grant compilation to match Tailscale SaaS behavior. Includes MapResponse golden tests for via grant route steering verification.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
bca6e6334d integration: add custom subnet support and fix exit node tests
Add NetworkSpec struct with optional Subnet field to ScenarioSpec.Networks.
When Subnet is set, the Docker network is created with that specific CIDR
instead of Docker's auto-assigned RFC1918 range.

Fix all exit node integration tests to use curl + traceroute. Tailscale
exit nodes strip locally-connected subnets from their forwarding filter
(shrinkDefaultRoute + localInterfaceRoutes), so exit nodes cannot
forward to IPs on their Docker network via the default route alone.
This is by design: exit nodes provide internet access, not LAN access.
To also get LAN access, the subnet must be explicitly advertised as a
route — matching real-world Tailscale deployment requirements.

- TestSubnetRouterMultiNetworkExitNode: advertise usernet1 subnet
  alongside exit route, upgraded from ping to curl + traceroute
- TestGrantViaExitNodeSteering: usernet1 subnet in via grants and
  auto-approvers alongside autogroup:internet
- TestGrantViaMixedSteering: externet subnet in auto-approvers and
  route advertisement for exit traffic

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
0431039f2a servertest: add regression tests for via grant filter rules
Add three tests that verify control plane behavior for grant policies:

- TestGrantViaSubnetFilterRules: verifies the router's PacketFilter
  contains destination rules for via-steered subnets. Without per-node
  filter compilation for via grants, these rules were missing and the
  router would drop forwarded traffic.

- TestGrantViaExitNodeFilterRules: same verification for exit nodes
  with via grants steering autogroup:internet traffic.

- TestGrantIPv6OnlyPrefixACL: verifies that address-based aliases
  (Prefix, Host) resolve to exactly the literal prefix and do not
  expand to include the matching node's other IP addresses. An
  IPv6-only host definition produces only IPv6 filter rules.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
ccd284c0a5 policy/v2: use per-node filter compilation for via grants
Via grants compile filter rules that depend on the node's route state
(SubnetRoutes, ExitRoutes). Without per-node compilation, these rules
were only included in the global filter path which explicitly skips via
grants (compileFilterRules skips grants with non-empty Via fields).

Add a needsPerNodeFilter flag that is true when the policy uses either
autogroup:self or via grants. filterForNodeLocked now uses this flag
instead of usesAutogroupSelf alone, ensuring via grant rules are
compiled per-node through compileFilterRulesForNode/compileViaGrant.

The filter cache also needs to account for route-dependent compilation:

- nodesHavePolicyAffectingChanges now treats route changes as
  policy-affecting when needsPerNodeFilter is true, so SetNodes
  triggers updateLocked and clears caches through the normal flow.

- invalidateGlobalPolicyCache now clears compiledFilterRulesMap
  (the unreduced per-node cache) alongside filterRulesMap when
  needsPerNodeFilter is true and routes changed.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
9db5fb6393 integration: fix error message assertion for invalid ACL action
Action.UnmarshalJSON produces the format
'action="unknown-action" is not supported: invalid ACL action',
not the reversed format the test expected.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
3ca4ff8f3f state,servertest: add grant control plane tests and fix via route ReduceRoutes filtering
Add servertest grant policy control plane tests covering basic grants, via grants, and cap grants. Fix ReduceRoutes in State to apply route reduction to non-via routes first, then append via-included routes, preventing via grant routes from being incorrectly filtered.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
5cd5e5de69 policy/v2: add unit tests for ViaRoutesForPeer
Test via route computation for viewer-peer pairs: self-steering returns
empty, viewer not in source returns empty, peer without advertised
destination returns empty, peer with/without via tag populates
Include/Exclude respectively, mixed prefix and autogroup:internet
destinations, and exit route steering.

7 subtests covering all code paths in ViaRoutesForPeer.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
08d26e541c policy/v2: add unit tests for grant filter compilation helpers
Test companionCapGrantRules, sourcesHaveWildcard, sourcesHaveDangerAll,
srcIPsWithRoutes, the FilterAllowAll fix for grant-only policies,
compileViaGrant, compileGrantWithAutogroupSelf grant paths, and
destinationsToNetPortRange autogroup:internet skipping.

51 subtests across 8 test functions covering all grant-specific code
paths in filter.go that previously had no test coverage.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
d243adaedd types,mapper,integration: enable Taildrive and add cap/drive grant lifecycle test
Add NodeAttrsTaildriveShare and NodeAttrsTaildriveAccess to the node capability map, enabling Taildrive file sharing when granted via policy. Add integration test verifying the full cap/drive grant lifecycle.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
9b1a6b6c05 integration: add cap/relay grant peer relay lifecycle test
Add ConnectToNetwork to the TailscaleClient interface for multi-network test scenarios and implement peer relay ping support. Use these to test that cap/relay grants correctly enable peer-to-peer relay connections between tagged nodes.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
8573ff9158 policy/v2: fix grant-only policies returning FilterAllowAll
compileFilterRules checked only pol.ACLs == nil to decide whether
to return FilterAllowAll (permit-any). Policies that use only Grants
(no ACLs) had nil ACLs, so the function short-circuited before
compiling any CapGrant rules. This meant cap/relay, cap/drive, and
any other App-based grant capabilities were silently ignored.

Check both ACLs and Grants are empty before returning FilterAllowAll.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
a739862c65 integration: add via grant route steering tests
Add integration tests validating that via grants correctly steer
routes to designated nodes per client group:

- TestGrantViaSubnetSteering: two routers advertise the same
  subnet, via grants steer each client group to a specific router.
  Verifies per-client route visibility, curl reachability, and
  traceroute path.

- TestGrantViaExitNodeSteering: two exit nodes, via grants steer
  each client group to a designated exit node. Verifies exit
  routes are withdrawn from non-designated nodes and the client
  rejects setting a non-designated exit node.

- TestGrantViaMixedSteering: cross-steering where subnet routes
  and exit routes go to different servers per client group.
  Verifies subnet traffic uses the subnet-designated server while
  exit traffic uses the exit-designated server.

Also add autogroupp helper for constructing AutoGroup aliases in
grant policy configurations.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
8358017dcf policy/v2,state,mapper: implement per-viewer via route steering
Via grants steer routes to specific nodes per viewer. Until now,
all clients saw the same routes for each peer because route
assembly was viewer-independent. This implements per-viewer route
visibility so that via-designated peers serve routes only to
matching viewers, while non-designated peers have those routes
withdrawn.

Add ViaRouteResult type (Include/Exclude prefix lists) and
ViaRoutesForPeer to the PolicyManager interface. The v2
implementation iterates via grants, resolves sources against the
viewer, matches destinations against the peer's advertised routes
(both subnet and exit), and categorizes prefixes by whether the
peer has the via tag.

Add RoutesForPeer to State which composes global primary election,
via Include/Exclude filtering, exit routes, and ACL reduction.
When no via grants exist, it falls back to existing behavior.

Update the mapper to call RoutesForPeer per-peer instead of using
a single route function for all peers. The route function now
returns all routes (subnet + exit), and TailNode filters exit
routes out of the PrimaryRoutes field for HA tracking.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
28be15f8ea policy/v2: handle autogroup:internet in via grant compilation
compileViaGrant only handled *Prefix destinations, skipping
*AutoGroup entirely. This meant via grants with
dst=[autogroup:internet] produced no filter rules even when the
node was an exit node with approved exit routes.

Switch the destination loop from a type assertion to a type switch
that handles both *Prefix (subnet routes) and *AutoGroup (exit
routes via autogroup:internet). Also check ExitRoutes() in
addition to SubnetRoutes() so the function doesn't bail early
when a node only has exit routes.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
687cf0882f policy/v2: implement autogroup:danger-all support
Add autogroup:danger-all as a valid source alias that matches ALL IP
addresses including non-Tailscale addresses. When used as a source,
it resolves to 0.0.0.0/0 + ::/0 internally but produces SrcIPs: ["*"]
in filter rules. When used as a destination, it is rejected with an
error matching Tailscale SaaS behavior.

Key changes:
- Add AutoGroupDangerAll constant and validation
- Add sourcesHaveDangerAll() helper and hasDangerAll parameter to
  srcIPsWithRoutes() across all compilation paths
- Add ErrAutogroupDangerAllDst for destination rejection
- Remove 3 AUTOGROUP_DANGER_ALL skip entries (K6, K7, K8)

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
4f040dead2 policy/v2: implement grant validation rules matching Tailscale SaaS
Implement comprehensive grant validation including: accept empty sources/destinations (they produce no rules), validate grant ip/app field requirements, capability name format, autogroup constraints, via tag existence, and default route CIDR restrictions.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
54db47badc policy/v2: implement via route compilation for grants
Compile grants with "via" field into FilterRules that are placed only
on nodes matching the via tag and actually advertising the destination
subnets. Key behavior:

- Filter rules go exclusively to via-nodes with matching approved routes
- Destination subnets not advertised by the via node are silently dropped
- App-only via grants (no ip field) produce no packet filter rules
- Via grants are skipped in the global compileFilterRules since they
  are node-specific

Reduces grant compat test skips from 41 to 30 (11 newly passing).

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
0e3acdd8ec policy/v2: implement CapGrant compilation with companion capabilities
Compile grant app fields into CapGrant FilterRules matching Tailscale
SaaS behavior. Key changes:

- Generate CapGrant rules in compileFilterRules and
  compileGrantWithAutogroupSelf, with node-specific /32 and /128
  Dsts for autogroup:self grants
- Add reversed companion rules for drive→drive-sharer and
  relay→relay-target capabilities, ordered by original cap name
- Narrow broad CapGrant Dsts to node-specific prefixes in
  ReduceFilterRules via new reduceCapGrantRule helper
- Skip merging CapGrant rules in mergeFilterRules to preserve
  per-capability structure
- Remove ip+app mutual exclusivity validation (Tailscale accepts both)
- Add semantic JSON comparison for RawMessage types and netip.Prefix
  comparators in test infrastructure

Reduces grant compat test skips from 99 to 41 (58 newly passing).

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
ebe0f4078d policy/v2: preserve non-wildcard source IPs alongside wildcard ranges
When an ACL source list contains a wildcard (*) alongside explicit
sources (tags, groups, hosts, etc.), Tailscale preserves the individual
IPs from non-wildcard sources in SrcIPs alongside the merged wildcard
CGNAT ranges. Previously, headscale's IPSetBuilder would merge all
sources into a single set, absorbing the explicit IPs into the wildcard
range.

Track non-wildcard resolved addresses separately during source
resolution, then append their individual IP strings to the output
when a wildcard is also present. This fixes the remaining 5 ACL
compat test failures (K01 and M06 subtests).

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
dda35847b0 policy/v2: reorder ACL self grants to match Tailscale rule ordering
When an ACL has non-autogroup destinations (groups, users, tags, hosts)
alongside autogroup:self, emit non-self grants before self grants to
match Tailscale's filter rule ordering. ACLs with only autogroup
destinations (self + member) preserve the policy-defined order.

This fixes ACL-A17, ACL-SF07, and ACL-SF11 compat test failures.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
f95b254ea9 policy/v2: exclude exit routes from ReduceFilterRules
Add exit route check in ReduceFilterRules to prevent exit nodes from receiving packet filter rules for destinations that only overlap via exit routes. Remove resolved SUBNET_ROUTE_FILTER_RULES grant skip entries and update error message formatting for grant validation.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
e05f45cfb1 policy/v2: use approved node routes in wildcard SrcIPs
Per Tailscale documentation, the wildcard (*) source includes "any
approved subnets" — the actually-advertised-and-approved routes from
nodes, not the autoApprover policy prefixes.

Change Asterix.resolve() to return just the base CGNAT+ULA set, and
add approved subnet routes as separate SrcIPs entries in the filter
compilation path. This preserves individual route prefixes that would
otherwise be merged by IPSet (e.g., 10.0.0.0/8 absorbing 10.33.0.0/16).

Also swap rule ordering in compileGrantWithAutogroupSelf() to emit
non-self destination rules before autogroup:self rules, matching the
Tailscale FilterRule wire format ordering.

Remove the unused AutoApproverPolicy.prefixes() method.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
995ed0187c policy/v2: add advertised routes to compat test topologies
Add routable_ips and approved_routes fields to the node topology
definitions in all golden test files. These represent the subnet
routes actually advertised by nodes on the Tailscale SaaS network
during data capture:

  Routes topology (92 files, 6 router nodes):
    big-router:     10.0.0.0/8
    subnet-router:  10.33.0.0/16
    ha-router1:     192.168.1.0/24
    ha-router2:     192.168.1.0/24
    multi-router:   172.16.0.0/24
    exit-node:      0.0.0.0/0, ::/0

  ACL topology (199 files, 1 router node):
    subnet-router:  10.33.0.0/16

  Grants topology (203 files, 1 router node):
    subnet-router:  10.33.0.0/16

The route assignments were deduced from the golden data by analyzing
which router nodes receive FilterRules for which destination CIDRs
across all test files, and cross-referenced with the MTS setup
script (setup_grant_nodes.sh).

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
927ce418d2 policy/v2: use bare IPs in autogroup:self DstPorts
Use ip.String() instead of netip.PrefixFrom(ip, ip.BitLen()).String()
when building DstPorts for autogroup:self destinations. This produces
bare IPs like "100.90.199.68" instead of CIDR notation like
"100.90.199.68/32", matching the Tailscale FilterRule wire format.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
93d79d8da9 policy: include IPv6 in identity-based alias resolution
AppendToIPSet now adds both IPv4 and IPv6 addresses for nodes, matching Tailscale's FilterRule wire format where identity-based aliases (tags, users, groups, autogroups) resolve to both address families. Update ReduceFilterRules test expectations to include IPv6 entries.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
500442c8f1 policy/v2: convert routes compat tests to data-driven format with Tailscale SaaS captures
Replace 8,286 lines of inline Go test expectations with 92 JSON golden files captured from Tailscale SaaS. The data-driven test driver validates route filtering, auto-approval, HA routing, and exit node behavior against real Tailscale output.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
2fb71690e8 policy/v2: convert ACL compat tests to data-driven format with Tailscale SaaS captures
Replace 9,937 lines of inline Go test expectations with 215 JSON golden files captured from Tailscale SaaS. The new data-driven test driver compares headscale's filter compilation output against real Tailscale behavior for each node in an 8-node topology.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
9f7aa55689 policy/v2: refactor alias resolution to use ResolvedAddresses
Introduce ResolvedAddresses type for structured IP set results. Refactor all Alias.Resolve() methods to return ResolvedAddresses instead of raw IPSets. Restrict identity-based aliases to matching address families, fix nil dereferences in partial resolution paths, and update test expectations for the new IP format (bare IPs, IP ranges instead of CIDR prefixes).

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
0fa9dcaff8 policy/v2: add data-driven grants compatibility test with Tailscale SaaS captures
Rename tailscale_compat_test.go to tailscale_acl_compat_test.go to make room for the grants compat test. Add 237 GRANT-*.json golden test files captured from Tailscale SaaS and a data-driven test driver that compares headscale's grant filter compilation against real Tailscale behavior.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
f74ea5b8ed hscontrol/policy/v2: add Grant policy format support
Add support for the Grant policy format as an alternative to ACL format,
following Tailscale's policy v2 specification. Grants provide a more
structured way to define network access rules with explicit separation
of IP-based and capability-based permissions.

Key changes:

- Add Grant struct with Sources, Destinations, InternetProtocols (ip),
  and App (capabilities) fields
- Add ProtocolPort type for unmarshaling protocol:port strings
- Add Grant validation in Policy.validate() to enforce:
  - Mutual exclusivity of ip and app fields
  - Required ip or app field presence
  - Non-empty sources and destinations
- Refactor compileFilterRules to support both ACLs and Grants
- Convert ACLs to Grants internally via aclToGrants() for unified
  processing
- Extract destinationsToNetPortRange() helper for cleaner code
- Rename parseProtocol() to toIANAProtocolNumbers() for clarity
- Add ProtocolNumberToName mapping for reverse lookups

The Grant format allows policies to be written using either the legacy
ACL format or the new Grant format. ACLs are converted to Grants
internally, ensuring backward compatibility while enabling the new
format's benefits.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
53b8a81d48 servertest: support tagged pre-auth keys in test clients
WithTags was defined but never passed through to CreatePreAuthKey.
Fix NewClient to use CreateTaggedPreAuthKey when tags are specified,
enabling tests that need tagged nodes (e.g. via grant steering).

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
15c1cfd778 types: include ExitRoutes in HasNetworkChanges
When exit routes are approved, SubnetRoutes remains empty because exit
routes (0.0.0.0/0, ::/0) are classified separately. Without checking
ExitRoutes, the PolicyManager cache is not invalidated on exit route
approval, causing stale filter rules that lack via grant entries for
autogroup:internet destinations.

Updates #2180
2026-04-01 14:10:42 +01:00
Kristoffer Dalby
a76b4bd46c ci: switch integration tests to ARM runners
Switch all integration test jobs (build, build-postgres, test
template) from ubuntu-latest (x86_64) to ubuntu-24.04-arm (aarch64).

ARM runners on GitHub Actions are free for public repos and tend
to have more consistent performance characteristics than the
shared x86_64 pool. This should reduce flakiness caused by
resource contention on congested runners.

Updates #3125
2026-03-31 22:06:25 +02:00
Kristoffer Dalby
a9a2001ae7 integration: scale remaining hardcoded timeouts and replace pingAllHelper
Apply CI-aware scaling to all remaining hardcoded timeouts:

- requireAllClientsOfflineStaged: scale the three internal stage
  timeouts (15s/20s/60s) with ScaledTimeout.
- validateReloginComplete: scale requireAllClientsOnline (120s)
  and requireAllClientsNetInfoAndDERP (3min) calls.
- WaitForTailscaleSyncPerUser callers in acl_test.go (3 sites, 60s).
- WaitForRunning callers in tags_test.go (10 sites): switch to
  PeerSyncTimeout() to match convention.
- WaitForRunning/WaitForPeers direct callers in route_test.go.
- requireAllClientsOnline callers in general_test.go and
  auth_key_test.go.

Replace pingAllHelper with assertPingAll/assertPingAllWithCollect:

- Wraps pings in EventuallyWithT so transient docker exec timeouts
  are retried instead of immediately failing the test.
- Timeout scales with the ping matrix size (2s per ping budget for
  2 full sweeps) so large tests get proportionally more time.
- Uses CollectT correctly, fixing the broken EventuallyWithT usage
  in TestEphemeral where the old t.Errorf bypassed CollectT.
- Follows the established assert*/assertWithCollect naming.

Updates #3125
2026-03-31 22:06:25 +02:00
Kristoffer Dalby
acb8cfc7ee integration: make docker execute and ping timeouts CI-aware
The default docker execute timeout (10s) is the root cause of
"dockertest command timed out" errors across many integration tests
on CI. On congested GitHub Actions runners, docker exec latency
alone can consume 2-5 seconds of this budget before the command
even starts inside the container.

Replace the hardcoded 10s constant with a function that returns
20s on CI, doubling the budget for all container commands
(tailscale status, headscale CLI, curl, etc.).

Similarly, scale the default tailscale ping timeout from 200ms to
400ms on CI. This doubles the per-attempt budget and the docker
exec timeout for pings (from 200ms*5=1s to 400ms*5=2s), giving
more headroom for docker exec overhead.

Updates #3125
2026-03-31 22:06:25 +02:00
Kristoffer Dalby
f1e5f1346d integration/acl: add tag verification step to TestACLTagPropagationPortSpecific
TestACLTagPropagationPortSpecific failed twice on CI because it jumped
from SetNodeTags directly to checking curl, without first verifying the
tag change was applied on the server. This races against server-side
processing.

Add a tag verification step (matching TestACLTagPropagation's pattern)
and bump the Step 4 timeout from 60s to 90s since port-specific filter
changes require both endpoints to process the new PacketFilter from
the MapResponse while the WireGuard tunnel stays up.

Updates #3125
2026-03-31 22:06:25 +02:00
Kristoffer Dalby
210f58f62e integration: use CI-scaled timeouts for all EventuallyWithT assertions
Wrap all 329 hardcoded EventuallyWithT timeouts across 12 test files
with integrationutil.ScaledTimeout(), which applies a 2x multiplier
on CI runners. This addresses the systemic issue where hardcoded
timeouts that work locally are insufficient under CI resource
contention.

Variable-based timeouts (propagationTime, assertTimeout in
route_test.go and totalWaitTime in auth_oidc_test.go) are wrapped
at their definition site so all downstream usages benefit.

The retry intervals (second duration parameter) are intentionally
NOT scaled, as they control polling frequency, not total wait time.

Updates #3125
2026-03-31 22:06:25 +02:00
Kristoffer Dalby
a147b0cd87 integration/acl: use CurlFailFast for all negative curl assertions
Replace Curl() with CurlFailFast() in all negative curl assertions
(where the test expects the connection to fail). CurlFailFast uses
1 retry and 2s max time instead of 3 retries and 5s max, which
avoids wasting time on unnecessary retries when we expect the
connection to be blocked.

This affects 21 call sites across 7 test functions:

- TestACLAllowUser80Dst
- TestACLDenyAllPort80
- TestACLAllowUserDst
- TestACLAllowStarDst
- TestACLNamedHostsCanReach
- TestACLDevice1CanAccessDevice2
- TestPolicyUpdateWhileRunningWithCLIInDatabase
- TestACLAutogroupSelf
- TestACLPolicyPropagationOverTime

Where possible, the inline Curl+Error pattern is replaced with the
assertCurlFailWithCollect helper introduced in the previous commit.

Updates #3125
2026-03-31 22:06:25 +02:00
Kristoffer Dalby
a7edcf3b0f integration: add CI-scaled timeouts and curl helpers for flaky ACL tests
Add ScaledTimeout to scale EventuallyWithT timeouts by 2x on CI,
consistent with the existing PeerSyncTimeout (60s/120s) and
dockertestMaxWait (300s/600s) conventions.

Add assertCurlSuccessWithCollect and assertCurlFailWithCollect helpers
following the existing *WithCollect naming convention.
assertCurlFailWithCollect uses CurlFailFast internally for aggressive
timeouts, avoiding wasted retries when expecting blocked connections.

Apply these to the three flakiest ACL tests:

- TestACLTagPropagation: swap NetMap and curl verification order so
  the fast NetMap check (confirms MapResponse arrived) runs before
  the slower curl check. Use curl helpers and scaled timeouts.

- TestACLTagPropagationPortSpecific: use curl helpers and scaled
  timeouts.

- TestACLHostsInNetMapTable: scale the 10s EventuallyWithT timeout.

Updates #3125
2026-03-31 22:06:25 +02:00
Kristoffer Dalby
fda72ad1a3 Update main.md
Co-authored-by: nblock <nblock@users.noreply.github.com>
2026-03-31 13:36:31 +02:00
Kristoffer Dalby
dfaf120f2a docs: add development builds install page
Move the container image and binary download details from the README
into a dedicated documentation page at setup/install/main. This gives
development builds a proper home in the docs site alongside the other
install methods. The README now links to the docs page instead.
2026-03-31 13:36:31 +02:00
Kristoffer Dalby
e171d30179 ci: add build workflow for main branch
Build and push multi-arch container images (linux/amd64, linux/arm64)
to GHCR and Docker Hub on every push to main that changes Go or Nix
files. Images are tagged as main-<short-sha> using ko with the same
distroless base image as release builds.

Cross-compiled binaries for linux and darwin (amd64, arm64) are
uploaded as workflow artifacts. The README links to these via
nightly.link for stable download URLs.
2026-03-31 13:36:31 +02:00
Kristoffer Dalby
0c6b9f5348 goreleaser: remove unused ts2019 build tag
The ts2019 build tag is no longer used. Remove it from the
goreleaser build configuration.
2026-03-31 13:36:31 +02:00
Florian Preinstorfer
f3512d50df Switch to mkdocs-materialx
The project mkdocs-material is in maintenance-only mode and their
successor is not ready yet.

Use the modern, refreshed theme and drop the pymdownx.magiclink
extension.
2026-03-25 22:30:03 +01:00
Florian Preinstorfer
efd83da14e Explicitly mention that a headscale username should *not* end with @
See: #3149
2026-03-20 19:44:33 +01:00
Tanayk07
568baf3d02 fix: align banner right-side border to consistent 64-char width 2026-03-19 07:08:35 +01:00
Tanayk07
5105033224 feat: add prominent warning banner for non-standard IP prefixes
Add a highly visible ASCII-art warning banner that is printed at
startup when the configured IP prefixes fall outside the standard
Tailscale CGNAT (100.64.0.0/10) or ULA (fd7a:115c:a1e0::/48) ranges.

The warning fires once even if both v4 and v6 are non-standard, and
the warnBanner() function is reusable for other critical configuration
warnings in the future.

Also updates config-example.yaml to clarify that subsets of the
default ranges are fine, but ranges outside CGNAT/ULA are not.

Closes #3055
2026-03-19 07:08:35 +01:00
Kristoffer Dalby
3d53f97c82 hscontrol/servertest: fix test expectations for eventual consistency
Three corrections to issue tests that had wrong assumptions about
when data becomes available:

1. initial_map_should_include_peer_online_status: use WaitForCondition
   instead of checking the initial netmap. Online status is set by
   Connect() which sends a PeerChange patch after the initial
   RegisterResponse, so it may not be present immediately.

2. disco_key_should_propagate_to_peers: use WaitForCondition. The
   DiscoKey is sent in the first MapRequest (not RegisterRequest),
   so peers may not see it until a subsequent map update.

3. approved_route_without_announcement: invert the test expectation.
   Tailscale uses a strict advertise-then-approve model -- routes are
   only distributed when the node advertises them (Hostinfo.RoutableIPs)
   AND they are approved. An approval without advertisement is a dormant
   pre-approval. The test now asserts the route does NOT appear in
   AllowedIPs, matching upstream Tailscale semantics.

Also fix TestClient.Reconnect to clear the cached netmap and drain
pending updates before re-registering. Without this, WaitForPeers
returned immediately based on the old session's stale data.
2026-03-19 07:05:58 +01:00
Kristoffer Dalby
1053fbb16b hscontrol/state: fix online status reset during re-registration
Two fixes to how online status is handled during registration:

1. Re-registration (applyAuthNodeUpdate, HandleNodeFromPreAuthKey) no
   longer resets IsOnline to false. Online status is managed exclusively
   by Connect()/Disconnect() in the poll session lifecycle. The reset
   caused a false offline blip: the auth handler's change notification
   triggered a map regeneration showing the node as offline to peers,
   even though Connect() would set it back to true moments later.

2. New node creation (createAndSaveNewNode) now explicitly sets
   IsOnline=false instead of leaving it nil. This ensures peers always
   receive a known online status rather than an ambiguous nil/unknown.
2026-03-19 07:05:58 +01:00
Kristoffer Dalby
b09af3846b hscontrol/poll,state: fix grace period disconnect TOCTOU race
When a node disconnects, serveLongPoll defers a cleanup that starts a
grace period goroutine. This goroutine polls batcher.IsConnected() and,
if the node has not reconnected within ~10 seconds, calls
state.Disconnect() to mark it offline. A TOCTOU race exists: the node
can reconnect (calling Connect()) between the IsConnected check and
the Disconnect() call, causing the stale Disconnect() to overwrite
the new session's online status.

Fix with a monotonic per-node generation counter:

- State.Connect() increments the counter and returns the current
  generation alongside the change list.
- State.Disconnect() accepts the generation from the caller and
  rejects the call if a newer generation exists, making stale
  disconnects from old sessions a no-op.
- serveLongPoll captures the generation at Connect() time and passes
  it to Disconnect() in the deferred cleanup.
- RemoveNode's return value is now checked: if another session already
  owns the batcher slot (reconnect happened), the old session skips
  the grace period entirely.

Update batcher_test.go to track per-node connect generations and
pass them through to Disconnect(), matching production behavior.

Fixes the following test failures:
- server_state_online_after_reconnect_within_grace
- update_history_no_false_offline
- nodestore_correct_after_rapid_reconnect
- rapid_reconnect_peer_never_sees_offline
2026-03-19 07:05:58 +01:00
Kristoffer Dalby
00c41b6422 hscontrol/servertest: add race, stress, and poll race tests
Add three test files designed to stress the control plane under
concurrent and adversarial conditions:

- race_test.go: 14 tests exercising concurrent mutations, session
  replacement, batcher contention, NodeStore access, and map response
  delivery during disconnect. All pass the Go race detector.

- poll_race_test.go: 8 tests targeting the poll.go grace period
  interleaving. These confirm a logical TOCTOU race: when a node
  disconnects and reconnects within the grace period, the old
  session's deferred Disconnect() can overwrite the new session's
  Connect(), leaving IsOnline=false despite an active poll session.

- stress_test.go: sustained churn, rapid mutations, rolling
  replacement, data integrity checks under load, and verification
  that rapid reconnects do not leak false-offline notifications.

Known failing tests (grace period TOCTOU race):
- server_state_online_after_reconnect_within_grace
- update_history_no_false_offline
- rapid_reconnect_peer_never_sees_offline
2026-03-19 07:05:58 +01:00
Kristoffer Dalby
ab4e205ce7 hscontrol/servertest: expand issue tests to 24 scenarios, surface 4 issues
Split TestIssues into 7 focused test functions to stay under cyclomatic
complexity limits while testing more aggressively.

Issues surfaced (4 failing tests):

1. initial_map_should_include_peer_online_status: Initial MapResponse
   has Online=nil for peers. Online status only arrives later via
   PeersChangedPatch.

2. disco_key_should_propagate_to_peers: DiscoPublicKey set by client
   is not visible to peers. Peers see zero disco key.

3. approved_route_without_announcement_is_visible: Server-side route
   approval without client-side announcement silently produces empty
   SubnetRoutes (intersection of empty announced + approved = empty).

4. nodestore_correct_after_rapid_reconnect: After 5 rapid reconnect
   cycles, NodeStore reports node as offline despite having an active
   poll session. The connect/disconnect grace period interleaving
   leaves IsOnline in an incorrect state.

Passing tests (20) verify:
- IP uniqueness across 10 nodes
- IP stability across reconnect
- New peers have addresses immediately
- Node rename propagates to peers
- Node delete removes from all peer lists
- Hostinfo changes (OS field) propagate
- NodeStore/DB consistency after route mutations
- Grace period timing (8-20s window)
- Ephemeral node deletion (not just offline)
- 10-node simultaneous connect convergence
- Rapid sequential node additions
- Reconnect produces complete map
- Cross-user visibility with default policy
- Same-user multiple nodes get distinct IDs
- Same-hostname nodes get unique GivenNames
- Policy change during connect still converges
- DERP region references are valid
- User profiles present for self and peers
- Self-update arrives after route approval
- Route advertisement stored as AnnouncedRoutes
2026-03-19 07:05:58 +01:00
Kristoffer Dalby
f87b08676d hscontrol/servertest: add policy, route, ephemeral, and content tests
Extend the servertest harness with:
- TestClient.Direct() accessor for advanced operations
- TestClient.WaitForPeerCount and WaitForCondition helpers
- TestHarness.ChangePolicy for ACL policy testing
- AssertDERPMapPresent and AssertSelfHasAddresses

New test suites:
- content_test.go: self node, DERP map, peer properties, user profiles,
  update history monotonicity, and endpoint update propagation
- policy_test.go: default allow-all, explicit policy, policy triggers
  updates on all nodes, multiple policy changes, multi-user mesh
- ephemeral_test.go: ephemeral connect, cleanup after disconnect,
  mixed ephemeral/regular, reconnect prevents cleanup
- routes_test.go: addresses in AllowedIPs, route advertise and approve,
  advertised routes via hostinfo, CGNAT range validation

Also fix node_departs test to use WaitForCondition instead of
assert.Eventually, and convert concurrent_join_and_leave to
interleaved_join_and_leave with grace-period-tolerant assertions.
2026-03-19 07:05:58 +01:00
Kristoffer Dalby
ca7362e9aa hscontrol/servertest: add control plane lifecycle and consistency tests
Add three test files exercising the servertest harness:

- lifecycle_test.go: connection, disconnection, reconnection, session
  replacement, and mesh formation at various sizes.
- consistency_test.go: symmetric visibility, consistent peer state,
  address presence, concurrent join/leave convergence.
- weather_test.go: rapid reconnects, flapping stability, reconnect
  with various delays, concurrent reconnects, and scale tests.

All tests use table-driven patterns with subtests.
2026-03-19 07:05:58 +01:00
Kristoffer Dalby
0288614bdf hscontrol: add servertest harness for in-process control plane testing
Add a new hscontrol/servertest package that provides a test harness
for exercising the full Headscale control protocol in-process, using
Tailscale's controlclient.Direct as the client.

The harness consists of:
- TestServer: wraps a Headscale instance with an httptest.Server
- TestClient: wraps controlclient.Direct with NetworkMap tracking
- TestHarness: orchestrates N clients against a single server
- Assertion helpers for mesh completeness, visibility, and consistency

Export minimal accessor methods on Headscale (HTTPHandler, NoisePublicKey,
GetState, SetServerURL, StartBatcher, StartEphemeralGC) so the servertest
package can construct a working server from outside the hscontrol package.

This enables fast, deterministic tests of connection lifecycle, update
propagation, and network weather scenarios without Docker.
2026-03-19 07:05:58 +01:00
Kristoffer Dalby
82c7efccf8 mapper/batcher: serialize per-node work to prevent out-of-order delivery
processBatchedChanges queued each pending change for a node as a
separate work item. Since multiple workers pull from the same channel,
two changes for the same node could be processed concurrently by
different workers. This caused two problems:

1. MapResponses delivered out of order — a later change could finish
   generating before an earlier one, so the client sees stale state.
2. updateSentPeers and computePeerDiff race against each other —
   updateSentPeers does Clear() + Store() which is not atomic relative
   to a concurrent Range() in computePeerDiff.

Bundle all pending changes for a node into a single work item so one
worker processes them sequentially. Add a per-node workMu that
serializes processing across consecutive batch ticks, preventing a
second worker from starting tick N+1 while tick N is still in progress.

Fixes #3140
2026-03-19 07:05:58 +01:00
Kristoffer Dalby
81b871c9b5 integration/acl: replace custom entrypoints with WithPackages
Replace inline WithDockerEntrypoint shell scripts in
TestACLTagPropagation and TestACLTagPropagationPortSpecific with
the standard WithPackages and WithWebserver options.

The custom entrypoints used fragile fixed sleeps and lacked the
robust network/cert readiness waits that buildEntrypoint provides.

Updates #3139
2026-03-16 03:57:05 -07:00
Kristoffer Dalby
e5ebe3205a integration: standardize test infrastructure options
Make embedded DERP server and TLS the default configuration for all
integration tests, replacing the per-test opt-in model that led to
inconsistent and flaky test behavior.

Infrastructure changes:
- DefaultConfigEnv() includes embedded DERP server settings
- New() auto-generates a proper CA + server TLS certificate pair
- CA cert is installed into container trust stores and returned by
  GetCert() so clients and internal tools (curl) trust the server
- CreateCertificate() now returns (caCert, cert, key) instead of
  discarding the CA certificate
- Add WithPublicDERP() and WithoutTLS() opt-out options
- Remove WithTLS(), WithEmbeddedDERPServerOnly(), and WithDERPAsIP()
  since all their behavior is now the default or unnecessary

Test cleanup:
- Remove all redundant WithTLS/WithEmbeddedDERPServerOnly/WithDERPAsIP
  calls from test files
- Give every test a unique WithTestName by parameterizing aclScenario,
  sshScenario, and derpServerScenario helpers
- Add WithTestName to tests that were missing it
- Document all non-standard options with inline comments explaining
  why each is needed

Updates #3139
2026-03-16 03:57:05 -07:00
Kristoffer Dalby
87b8507ac9 mapper/batcher: replace connected map with per-node disconnectedAt
The Batcher's connected field (*xsync.Map[types.NodeID, *time.Time])
encoded three states via pointer semantics:

  - nil value:    node is connected
  - non-nil time: node disconnected at that timestamp
  - key missing:  node was never seen

This was error-prone (nil meaning 'connected' inverts Go idioms),
redundant with b.nodes + hasActiveConnections(), and required keeping
two parallel maps in sync. It also contained a bug in RemoveNode where
new(time.Now()) was used instead of &now, producing a zero time.

Replace the separate connected map with a disconnectedAt field on
multiChannelNodeConn (atomic.Pointer[time.Time]), tracked directly
on the object that already manages the node's connections.

Changes:
  - Add disconnectedAt field and helpers (markConnected, markDisconnected,
    isConnected, offlineDuration) to multiChannelNodeConn
  - Remove the connected field from Batcher
  - Simplify IsConnected from two map lookups to one
  - Simplify ConnectedMap and Debug from two-map iteration to one
  - Rewrite cleanupOfflineNodes to scan b.nodes directly
  - Remove the markDisconnectedIfNoConns helper
  - Update all tests and benchmarks

Fixes #3141
2026-03-16 02:22:56 -07:00
Kristoffer Dalby
60317064fd mapper/batcher: serialize per-node work to prevent out-of-order delivery
processBatchedChanges queued each pending change for a node as a
separate work item. Since multiple workers pull from the same channel,
two changes for the same node could be processed concurrently by
different workers. This caused two problems:

1. MapResponses delivered out of order — a later change could finish
   generating before an earlier one, so the client sees stale state.
2. updateSentPeers and computePeerDiff race against each other —
   updateSentPeers does Clear() + Store() which is not atomic relative
   to a concurrent Range() in computePeerDiff.

Bundle all pending changes for a node into a single work item so one
worker processes them sequentially. Add a per-node workMu that
serializes processing across consecutive batch ticks, preventing a
second worker from starting tick N+1 while tick N is still in progress.

Fixes #3140
2026-03-16 02:22:46 -07:00
Juan Font
4d427cfe2a noise: limit request body size to prevent unauthenticated OOM
The Noise handshake accepts any machine key without checking
registration, so all endpoints behind the Noise router are reachable
without credentials. Three handlers used io.ReadAll without size
limits, allowing an attacker to OOM-kill the server.

Fix:
- Add http.MaxBytesReader middleware (1 MiB) on the Noise router.
- Replace io.ReadAll + json.Unmarshal with json.NewDecoder in
  PollNetMapHandler and RegistrationHandler.
- Stop reading the body in NotImplementedHandler entirely.
2026-03-16 09:28:31 +01:00
Kristoffer Dalby
afd3a6acbc mapper/batcher: remove disabled X-prefixed test functions
Remove XTestBatcherChannelClosingRace (~95 lines) and
XTestBatcherScalability (~515 lines). These were disabled by
prefixing with X (making them invisible to go test) and served
as dead code. The functionality they covered is exercised by the
active test suite.

Updates #2545
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
feaf85bfbc mapper/batcher: clean up test constants and output
L8: Rename SCREAMING_SNAKE_CASE test constants to idiomatic Go
camelCase. Remove highLoad* and extremeLoad* constants that were
only referenced by disabled (X-prefixed) tests.

L10: Fix misleading assert message that said "1337" while checking
for region ID 999.

L12: Remove emoji from test log output to avoid encoding issues
in CI environments.

Updates #2545
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
86e279869e mapper/batcher: minor production code cleanup
L1: Replace crypto/rand with an atomic counter for generating
connection IDs. These identifiers are process-local and do not need
cryptographic randomness; a monotonic counter is cheaper and
produces shorter, sortable IDs.

L5: Use getActiveConnectionCount() in Debug() instead of directly
locking the mutex and reading the connections slice. This avoids
bypassing the accessor that already exists for this purpose.

L6: Extract the hardcoded 15*time.Minute cleanup threshold into
the named constant offlineNodeCleanupThreshold.

L7: Inline the trivial addWork wrapper; AddWork now calls addToBatch
directly.

Updates #2545
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
7881f65358 mapper: extract node connection types to node_conn.go
Move connectionEntry, multiChannelNodeConn, generateConnectionID, and
all their methods from batcher.go into a dedicated file. This reduces
batcher.go from ~1170 lines to ~800 and separates per-node connection
management from batcher orchestration.

Pure move — no logic changes.

Updates #2545
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
2d549e579f mapper/batcher: add regression tests for M1, M3, M7 fixes
- TestBatcher_CloseBeforeStart_DoesNotHang: verifies Close() before
  Start() returns promptly now that done is initialized in NewBatcher.

- TestBatcher_QueueWorkAfterClose_DoesNotHang: verifies queueWork
  returns via the done channel after Close(), even without Start().

- TestIsConnected_FalseAfterAddNodeFailure: verifies IsConnected
  returns false after AddNode fails and removes the last connection.

- TestRemoveConnectionAtIndex_NilsTrailingSlot: verifies the backing
  array slot is nil-ed after removal to avoid retaining pointers.

Updates #2545
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
50e8b21471 mapper/batcher: fix pointer retention, done-channel init, and connected-map races
M7: Nil out trailing *connectionEntry pointers in the backing array
after slice removal in removeConnectionAtIndexLocked and send().
Without this, the GC cannot collect removed entries until the slice
is reallocated.

M1: Initialize the done channel in NewBatcher instead of Start().
Previously, calling Close() or queueWork before Start() would select
on a nil channel, blocking forever. Moving the make() to the
constructor ensures the channel is always usable.

M2: Move b.connected.Delete and b.totalNodes decrement inside the
Compute callback in cleanupOfflineNodes. Previously these ran after
the Compute returned, allowing a concurrent AddNode to reconnect
between the delete and the bookkeeping update, which would wipe the
fresh connected state.

M3: Call markDisconnectedIfNoConns on AddNode error paths. Previously,
when initial map generation or send timed out, the connection was
removed but b.connected retained its old nil (= connected) value,
making IsConnected return true for a node with zero connections.

Updates #2545
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
8e26651f2c mapper/batcher: add regression tests for timer leak and Close lifecycle
Add four unit tests guarding fixes introduced in recent commits:

- TestConnectionEntry_SendFastPath_TimerStopped: verifies the
  time.NewTimer fix (H1) does not leak goroutines after many
  fast-path sends on a buffered channel.

- TestBatcher_CloseWaitsForWorkers: verifies Close() blocks until all
  worker goroutines exit (H3), preventing sends on torn-down channels.

- TestBatcher_CloseThenStartIsNoop: verifies the one-shot lifecycle
  contract; Start() after Close() must not spawn new goroutines.

- TestBatcher_CloseStopsTicker: verifies Close() stops the internal
  ticker to prevent resource leaks.

Updates #2545
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
57a38b5678 mapper/batcher: reduce hot-path log verbosity
Remove Caller(), channel pointer formatting (fmt.Sprintf("%p",...)),
and mutex timing from send(), addConnection(), and
removeConnectionByChannel(). Move per-broadcast summary and
no-connection logs from Debug to Trace. Remove per-connection
"attempting"/"succeeded" logs entirely; keep Warn for failures.

These methods run on every MapResponse delivery, so the savings
compound quickly under load.

Updates #2545
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
051a38a4c4 mapper/batcher: track worker goroutines and stop ticker on Close
Close() previously closed the done channel and returned immediately,
without waiting for worker goroutines to exit. This caused goroutine
leaks in tests and allowed workers to race with connection teardown.
The ticker was also never stopped, leaking its internal goroutine.

Add a sync.WaitGroup to track the doWork goroutine and every worker
it spawns. Close() now calls wg.Wait() after signalling shutdown,
ensuring all goroutines have exited before tearing down connections.
Also stop the ticker to prevent resource leaks.

Document that a Batcher must not be reused after Close().
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
3276bda0c0 mapper/batcher: replace time.After with NewTimer to avoid timer leak
connectionEntry.send() is on the hot path: called once per connection
per broadcast tick. time.After allocates a timer that sits in the
runtime timer heap until it fires (50 ms), even when the channel send
succeeds immediately. At 1000 connected nodes, every tick leaks 1000
timers into the heap, creating continuous GC pressure.

Replace with time.NewTimer + defer timer.Stop() so the timer is
removed from the heap as soon as the fast-path send completes.
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
ebc57d9a38 integration/acl: fix TestACLPolicyPropagationOverTime infrastructure
Add embedded DERP server, TLS, and netfilter=off to match the
infrastructure configuration used by all other ACL integration tests.

Without these options, the test fails intermittently because traffic
routes through external DERP relays and iptables initialization fails
in Docker containers.

Updates #3139
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
2058343ad6 mapper: remove Batcher interface, rename to Batcher struct
Remove the Batcher interface since there is only one implementation.
Rename LockFreeBatcher to Batcher and merge batcher_lockfree.go into
batcher.go.

Drop type assertions in debug.go now that mapBatcher is a concrete
*mapper.Batcher pointer.
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
9b24a39943 mapper/batcher: add scale benchmarks
Add benchmarks that systematically test node counts from 100 to
50,000 to identify scaling limits and validate performance under
load.
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
3ebe4d99c1 mapper/batcher: reduce lock contention with two-phase send
Rewrite multiChannelNodeConn.send() to use a two-phase approach:
1. RLock: snapshot connections slice (cheap pointer copy)
2. Unlock: send to all connections (50ms timeouts happen here)
3. Lock: remove failed connections by pointer identity

Previously, send() held the write lock for the entire duration of
sending to all connections. With N stale connections each timing out
at 50ms, this blocked addConnection/removeConnection for N*50ms.
The two-phase approach holds the lock only for O(N) pointer
operations, not for N*50ms I/O waits.
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
da33795e79 mapper/batcher: fix race conditions in cleanup and lookups
Replace the two-phase Load-check-Delete in cleanupOfflineNodes with
xsync.Map.Compute() for atomic check-and-delete. This prevents the
TOCTOU race where a node reconnects between the hasActiveConnections
check and the Delete call.

Add nil guards on all b.nodes.Load() and b.nodes.Range() call sites
to prevent nil pointer panics from concurrent cleanup races.
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
57070680a5 mapper/batcher: restructure internals for correctness
Move per-node pending changes from a shared xsync.Map on the batcher
into multiChannelNodeConn, protected by a dedicated mutex. The new
appendPending/drainPending methods provide atomic append and drain
operations, eliminating data races in addToBatch and
processBatchedChanges.

Add sync.Once to multiChannelNodeConn.close() to make it idempotent,
preventing panics from concurrent close calls on the same channel.

Add started atomic.Bool to guard Start() against being called
multiple times, preventing orphaned goroutines.

Add comprehensive concurrency tests validating these changes.
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
21e02e5d1f mapper/batcher: add unit tests and benchmarks
Add comprehensive unit tests for the LockFreeBatcher covering
AddNode/RemoveNode lifecycle, addToBatch routing (broadcast, targeted,
full update), processBatchedChanges deduplication, cleanup of offline
nodes, close/shutdown behavior, IsConnected state tracking, and
connected map consistency.

Add benchmarks for connection entry send, multi-channel send and
broadcast, peer diff computation, sentPeers updates, addToBatch at
various scales (10/100/1000 nodes), processBatchedChanges, broadcast
delivery, IsConnected lookups, connected map enumeration, connection
churn, and concurrent send+churn scenarios.

Widen setupBatcherWithTestData to accept testing.TB so benchmarks can
reuse the same database-backed test setup as unit tests.
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
2f94b80e70 go.mod: add stress tool dependency
Add golang.org/x/tools/cmd/stress as a tool dependency for running
tests under repeated stress to surface flaky failures.

Update flake vendorHash for the new go.mod dependencies.
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
3e0a96ec3a all: fix test flakiness and improve test infrastructure
Buffer the AuthRequest verdict channel to prevent a race where the
sender blocks indefinitely if the receiver has already timed out, and
increase the auth followup test timeout from 100ms to 5s to prevent
spurious failures under load.

Skip postgres-backed tests when the postgres server is unavailable
instead of calling t.Fatal, which was preventing the rest of the test
suite from running.

Add TestMain to db, types, and policy/v2 packages to chdir to the
source directory before running tests. This ensures relative testdata/
paths resolve correctly when the test binary is executed from an
arbitrary working directory (e.g., via "go tool stress").
2026-03-14 02:52:28 -07:00
DM
fffc58b5d0 poll: fix poll test linter violations 2026-03-12 01:27:34 -07:00
DM
4aca9d6568 poll: stop stale map sessions through an explicit teardown hook
When stale-send cleanup prunes a connection from the batcher, the old serveLongPoll session needs an explicit stop signal. Pass a stop hook into AddNode and trigger it when that connection is removed, so the session exits through its normal cancel path instead of relying on channel closure from the batcher side.
2026-03-12 01:27:34 -07:00
DM
3daf45e88a mapper: close stale map channels after send timeouts
When the batcher timed out sending to a node, it removed the channel from multiChannelNodeConn but left the old serveLongPoll goroutine running on that channel. That left a live stale session behind: it no longer received new updates, but it could still keep the stream open and block shutdown.

Close the pruned channel when stale-send cleanup removes it so the old map session exits after draining any buffered update.
2026-03-12 01:27:34 -07:00
DM
b81d6c734d mapper: handle RemoveNode after channel cleanup
A connection can already be removed from multiChannelNodeConn by the stale-send cleanup path before serveLongPoll reaches its deferred RemoveNode call. In that case RemoveNode used to return early on "channel not found" and never updated the node's connected state.

Drop that early return so RemoveNode still checks whether any active connections remain and marks the node disconnected when the last one is gone.
2026-03-12 01:27:34 -07:00
Kristoffer Dalby
c5ef1d3bb9 nix: upgrade dev shell to Python 3.14
Update mdformat and related packages from python313Packages to
python314Packages. All four packages (mdformat, mdformat-footnote,
mdformat-frontmatter, mdformat-mkdocs) are available in the updated
nixpkgs.

Updates #1261
2026-03-11 03:18:14 -07:00
Kristoffer Dalby
542cdb2cb2 all: update Go to 1.26.1
Bump Go version from 1.26.0 to 1.26.1 across go.mod, Dockerfiles,
and the integration test runner fallback defaults.

Updates #1261
2026-03-11 03:18:14 -07:00
Kristoffer Dalby
5e33259550 nix: update flake inputs
Update nixpkgs from 2026-02-15 (ac055f38) to 2026-03-08 (608d0cad).

Updates #1261
2026-03-11 03:18:14 -07:00
Kristoffer Dalby
65880ecb58 nix: disable external DERP URL fetch in VM test
Explicitly set derp.urls to an empty list in the NixOS VM test,
matching the upstream nixpkgs test. The VMs have no internet
access, so fetching the default Tailscale DERP map would silently
fail and add unnecessary timeout delay to the test run.
2026-03-06 05:18:44 -08:00
Kristoffer Dalby
37c6a9e3a6 nix: sync module options and descriptions with upstream nixpkgs
Add missing typed options from the upstream nixpkgs module:
- configFile: read-only option exposing the generated config path
  for composability with other NixOS modules
- dns.split: split DNS configuration with proper type checking
- dns.extra_records: typed submodule with name/type/value validation

Sync descriptions and assertions with upstream:
- Use Tailscale doc link for override_local_dns description
- Remove redundant requirement note from nameservers.global
- Match upstream assertion message wording and expression style

Update systemd script to reference cfg.configFile instead of a
local let-binding, matching the upstream pattern.
2026-03-06 05:18:44 -08:00
DM
8423af2732 Swap favicon for updated version 2026-03-03 05:59:40 +01:00
Florian Preinstorfer
9baa795ddb Update docs for auth-id changes
- Replace "headscale nodes register" with "headscale auth register"
- Update from registration key to Auth ID
- Fix API example to register a node
2026-03-01 13:38:22 +01:00
Florian Preinstorfer
acddd73183 Reformat docs with mdformat 2026-03-01 09:24:52 +01:00
Florian Preinstorfer
47307d19cf Switch to mdformat to format docs
- Use mdformat and mdformat-mkdocs to format docs
- Add mdformat to Makefile and pre-commit-config
- Prettier ignores docs/
2026-03-01 09:24:52 +01:00
Kristoffer Dalby
5c449db125 ci: regenerate test-integration.yaml for TestSSHLocalpart
Updates #3049
2026-02-28 05:14:11 -08:00
Kristoffer Dalby
2be94ce19a integration: add TestSSHLocalpart integration test
Add end-to-end integration test that validates localpart:*@domain
SSH user mapping with real Tailscale clients. The test sets up an
SSH policy with localpart entries and verifies that users can SSH
into tagged servers using their email local-part as the username.

Updates #3049
2026-02-28 05:14:11 -08:00
Kristoffer Dalby
6c59d3e601 policy/v2: add SSH compatibility testdata from Tailscale SaaS
Add 39 test fixtures captured from Tailscale SaaS API responses
to validate SSH policy compilation parity. Each JSON file contains
the SSH policy section and expected compiled SSHRule arrays for 5
test nodes (3 user-owned, 2 tagged).

Test series: SSH-A (basic), SSH-B (specific sources), SSH-C
(destination combos), SSH-D (localpart), SSH-E (edge cases),
SSH-F (multi-rule), SSH-G (acceptEnv).

The data-driven TestSSHDataCompat harness uses cmp.Diff with
principal order tolerance but strict rule ordering (first-match-wins
semantics require exact order).

Updates #3049
2026-02-28 05:14:11 -08:00
Kristoffer Dalby
0acf09bdd2 policy/v2: add localpart:*@domain SSH user compilation
Add support for localpart:*@<domain> entries in SSH policy users.
When a user SSHes into a target, their email local-part becomes the
OS username (e.g. alice@example.com → OS user alice).

Type system (types.go):
- SSHUser.IsLocalpart() and ParseLocalpart() for validation
- SSHUsers.LocalpartEntries(), NormalUsers(), ContainsLocalpart()
- Enforces format: localpart:*@<domain> (wildcard-only)
- UserWildcard.Resolve for user:*@domain SSH source aliases
- acceptEnv passthrough for SSH rules

Compilation (filter.go):
- resolveLocalparts: pure function mapping users to local-parts
  by email domain. No node walking, easy to test.
- groupSourcesByUser: single walk producing per-user principals
  with sorted user IDs, and tagged principals separately.
- ipSetToPrincipals: shared helper replacing 6 inline copies.
- selfPrincipalsForNode: self-access using pre-computed byUser.

The approach separates data gathering from rule assembly. Localpart
rules are interleaved per source user to match Tailscale SaaS
first-match-wins ordering.

Updates #3049
2026-02-28 05:14:11 -08:00
QEDeD
414d3bbbd8 Fix typo in comment about fsnotify behavior
Correct loose (opposite of tight) to lose (opposite of keep).
2026-02-27 15:23:06 +01:00
Stefan Bethke
0f12e414a6 Explain one approach to update OIDC provider info
See #3112
2026-02-27 10:06:50 +01:00
Stefan Bethke
df339cd290 Add a link to Authentik's integration guide 2026-02-27 09:56:18 +01:00
DM
610c1daa4d types: avoid NodeView clone in CanAccess
NodeView.CanAccess called node2.AsStruct() on every check. In peer-map construction we run CanAccess in O(n^2) pair scans (often twice per pair), so that per-call clone multiplied into large heap churn
2026-02-26 19:15:07 -08:00
Kristoffer Dalby
84adda226b doc: add CHANGELOG entries for SSH check and auth commands
Updates #1850
2026-02-25 21:28:05 +01:00
Kristoffer Dalby
0f97294665 ci: regenerate integration test workflow
Updates #1850
2026-02-25 21:28:05 +01:00
Kristoffer Dalby
3db0a483ed integration: add SSH check mode tests
Add ReadLog method to headscale integration container for log
inspection. Split SSH check mode tests into CLI and OIDC variants
and add comprehensive test coverage:

- TestSSHOneUserToOneCheckModeCLI: basic check mode with CLI approval
- TestSSHOneUserToOneCheckModeOIDC: check mode with OIDC approval
- TestSSHCheckModeUnapprovedTimeout: rejection on cache expiry
- TestSSHCheckModeCheckPeriodCLI: session expiry and re-auth
- TestSSHCheckModeAutoApprove: auto-approval within check period
- TestSSHCheckModeNegativeCLI: explicit rejection via CLI

Update existing integration tests to use headscale auth register.

Updates #1850
2026-02-25 21:28:05 +01:00
Kristoffer Dalby
7bab8da366 state, policy, noise: implement SSH check period auto-approval
Add SSH check period tracking so that recently authenticated users
are auto-approved without requiring manual intervention each time.

Introduce SSHCheckPeriod type with validation (min 1m, max 168h,
"always" for every request) and encode the compiled check period
as URL query parameters in the HoldAndDelegate URL.

The SSHActionHandler checks recorded auth times before creating a
new HoldAndDelegate flow. Auth timestamps are stored in-memory:
- Default period (no explicit checkPeriod): auth covers any
  destination, keyed by source node with Dst=0 sentinel
- Explicit period: auth covers only that specific destination,
  keyed by (source, destination) pair

Auth times are cleared on policy changes.

Updates #1850
2026-02-25 21:28:05 +01:00
Kristoffer Dalby
48cc98b787 hscontrol, cli: add auth register and approve commands
Implement AuthRegister and AuthApprove gRPC handlers and add
corresponding CLI commands (headscale auth register, approve, reject)
for managing pending auth requests including SSH check approvals.

Updates #1850
2026-02-25 21:28:05 +01:00
Kristoffer Dalby
61a14bb0e4 gen: regenerate from auth proto changes
Updates #1850
2026-02-25 21:28:05 +01:00
Kristoffer Dalby
dc0e52a960 proto: add AuthRegister and AuthApprove RPCs
Add gRPC service definitions for managing auth requests:
AuthRegister to register interactive auth sessions and
AuthApprove/AuthReject to approve or deny pending requests
(used for SSH check mode).

Updates #1850
2026-02-25 21:28:05 +01:00
Kristoffer Dalby
107c2f2f70 policy, noise: implement SSH check action
Implement the SSH "check" action which requires additional
verification before allowing SSH access. The policy compiler generates
a HoldAndDelegate URL that the Tailscale client calls back to
headscale. The SSHActionHandler creates an auth session and waits for
approval via the generalised auth flow.

Sort check (HoldAndDelegate) rules before accept rules to match
Tailscale's first-match-wins evaluation order.

Updates #1850
2026-02-25 21:28:05 +01:00
Kristoffer Dalby
4a7e1475c0 templates: generalise auth templates for web and OIDC
Extract shared HTML/CSS design into a common template and create
generalised auth success and web auth templates that work for both
node registration and SSH check authentication flows.

Updates #1850
2026-02-25 21:28:05 +01:00
Kristoffer Dalby
cb3b6949ea auth: generalise auth flow and introduce AuthVerdict
Generalise the registration pipeline to a more general auth pipeline
supporting both node registrations and SSH check auth requests.
Rename RegistrationID to AuthID, unexport AuthRequest fields, and
introduce AuthVerdict to unify the auth finish API.

Add the urlParam generic helper for extracting typed URL parameters
from chi routes, used by the new auth request handler.

Updates #1850
2026-02-25 21:28:05 +01:00
Kristoffer Dalby
30338441c1 app: switch from gorilla to chi mux
Replace gorilla/mux with go-chi/chi as the HTTP router and add a
custom zerolog-based request logger to replace chi's default
stdlib-based middleware.Logger, consistent with the rest of the
application.

Updates #1850
2026-02-25 21:28:05 +01:00
Kristoffer Dalby
25ccb5a161 build: update golangci-lint and gopls in flake
Updates #1850
2026-02-25 21:28:05 +01:00
Kristoffer Dalby
8048f10d13 hscontrol/state: extract findExistingNodeForPAK to reduce complexity
Extract the existing-node lookup logic from HandleNodeFromPreAuthKey
into a separate method. This reduces the cyclomatic complexity from
32 to 28, below the gocyclo limit of 30.

Updates #3077
2026-02-20 21:51:00 +01:00
Kristoffer Dalby
be4fd9ff2d integration: fix tag tests for tagged nodes with nil user_id
Tagged nodes no longer have user_id set, so ListNodes(user) cannot
find them. Update integration tests to use ListNodes() (all nodes)
when looking up tagged nodes.

Add a findNode helper to locate nodes by predicate from an
unfiltered list, used in ACL tests that have multiple nodes per
scenario.

Updates #3077
2026-02-20 21:51:00 +01:00
Kristoffer Dalby
1e4fc3f179 hscontrol: add tests for deleting users with tagged nodes
Test the tagged-node-survives-user-deletion scenario at two layers:

DB layer (users_test.go):
- success_user_only_has_tagged_nodes: tagged nodes with nil
  user_id do not block user deletion and survive it
- error_user_has_tagged_and_owned_nodes: user-owned nodes
  still block deletion even when tagged nodes coexist

App layer (grpcv1_test.go):
- TestDeleteUser_TaggedNodeSurvives: full registration flow
  with tagged PreAuthKey verifies nil UserID after registration,
  absence from nodesByUser index, user deletion succeeds, and
  tagged node remains in global node list

Also update auth_tags_test.go assertions to expect nil UserID
on tagged nodes, consistent with the new invariant.

Updates #3077
2026-02-20 21:51:00 +01:00
Kristoffer Dalby
894e6946dc hscontrol/types: regenerate types_view.go
make generate

Updates #3077
2026-02-20 21:51:00 +01:00
Kristoffer Dalby
75e56df9e4 hscontrol: enforce that tagged nodes never have user_id
Tagged nodes are owned by their tags, not a user. Enforce this
invariant at every write path:

- createAndSaveNewNode: do not set UserID for tagged PreAuthKey
  registration; clear UserID when advertise-tags are applied
  during OIDC/CLI registration
- SetNodeTags: clear UserID/User when tags are assigned
- processReauthTags: clear UserID/User when tags are applied
  during re-authentication
- validateNodeOwnership: reject tagged nodes with non-nil UserID
- NodeStore: skip nodesByUser indexing for tagged nodes since
  they have no owning user
- HandleNodeFromPreAuthKey: add fallback lookup for tagged PAK
  re-registration (tagged nodes indexed under UserID(0)); guard
  against nil User deref for tagged nodes in different-user check

Since tagged nodes now have user_id = NULL, ListNodesByUser
will not return them and DestroyUser naturally allows deleting
users whose nodes have all been tagged. The ON DELETE CASCADE
FK cannot reach tagged nodes through a NULL foreign key.

Also tone down shouty comments throughout state.go.

Fixes #3077
2026-02-20 21:51:00 +01:00
Kristoffer Dalby
52d454d0c8 hscontrol/db: add migration to clear user_id on tagged nodes
Tagged nodes are owned by their tags, not a user. Previously
user_id was kept as "created by" tracking, but this prevents
deleting users whose nodes have all been tagged, and the
ON DELETE CASCADE FK would destroy the tagged nodes.

Add a migration that sets user_id = NULL on all existing tagged
nodes. Subsequent commits enforce this invariant at write time.

Updates #3077
2026-02-20 21:51:00 +01:00
Kristoffer Dalby
f20bd0cf08 node: implement disable key expiry via CLI and API
Add --disable flag to "headscale nodes expire" CLI command and
disable_expiry field handling in the gRPC API to allow disabling
key expiry for nodes. When disabled, the node's expiry is set to
NULL and IsExpired() returns false.

The CLI follows the new grpcRunE/RunE/printOutput patterns
introduced in the recent CLI refactor.

Also fix NodeSetExpiry to persist directly to the database instead
of going through persistNodeToDB which omits the expiry field.

Fixes #2681

Co-authored-by: Marco Santos <me@marcopsantos.com>
2026-02-20 21:49:55 +01:00
Kristoffer Dalby
a8f7fedced proto: add disable_expiry field to ExpireNodeRequest
Add bool disable_expiry field (field 3) to ExpireNodeRequest proto
and regenerate all protobuf, gRPC gateway, and OpenAPI files.

Fixes #2681

Co-authored-by: Marco Santos <me@marcopsantos.com>
2026-02-20 21:49:55 +01:00
Kristoffer Dalby
b668c7a596 policy/v2: add policy unmarshal tests for bracketed IPv6
Add end-to-end test cases to TestUnmarshalPolicy that verify bracketed
IPv6 addresses are correctly parsed through the full policy pipeline
(JSON unmarshal -> splitDestinationAndPort -> parseAlias -> parsePortRange)
and survive JSON round-trips.

Cover single port, multiple ports, wildcard port, CIDR prefix, port
range, bracketed IPv4, and hostname rejection.

Updates #2754
2026-02-20 21:49:21 +01:00
Kristoffer Dalby
49744cd467 policy/v2: accept RFC 3986 bracketed IPv6 in ACL destinations
Headscale rejects IPv6 addresses with square brackets in ACL policy
destinations (e.g. "[fd7a:115c:a1e0::87e1]:80,443"), while Tailscale
SaaS accepts them. The root cause is that splitDestinationAndPort uses
strings.LastIndex(":") which leaves brackets on the destination string,
and netip.ParseAddr does not accept brackets.

Add a bracket-handling branch at the top of splitDestinationAndPort that
uses net.SplitHostPort for RFC 3986 parsing when input starts with "[".
The extracted host is validated with netip.ParseAddr/ParsePrefix to
ensure brackets are only accepted around IP addresses and CIDR prefixes,
not hostnames or other alias types like tags and groups.

Fixes #2754
2026-02-20 21:49:21 +01:00
Brandon Sprague
a0d6802d5b Fix minor formatting issue in FAQ
This just fixes a small issue I noticed reading the docs: the two 'scenarios' listed in the scaling section end up showing up as a numbered list of five items, instead of the desired two items + their descriptions.
2026-02-20 16:59:14 +01:00
Kristoffer Dalby
13ebea192c cmd/headscale/cli: remove nil resp guards and unexport HasMachineOutputFlag
Remove dead if-resp-nil checks in tagCmd and approveRoutesCmd; gRPC
returns either an error or a valid response, never (nil, nil).

Rename HasMachineOutputFlag to hasMachineOutputFlag since it has a
single internal caller in root.go.
2026-02-20 11:42:07 +01:00
Kristoffer Dalby
af777f44f4 cmd/headscale/cli: extract bypassDatabase helper and simplify policy file reads
Add bypassDatabase() to consolidate the repeated LoadServerConfig +
NewHeadscaleDatabase pattern in getPolicy and setPolicy.

Replace os.Open + io.ReadAll with os.ReadFile in setPolicy and
checkPolicy, removing the manual file-handle management.
2026-02-20 11:42:07 +01:00
Kristoffer Dalby
7460bec767 cmd/headscale/cli: move errMissingParameter and Error type to their users
Move errMissingParameter from users.go to utils.go alongside the
other shared sentinel errors; the variable is referenced by
api_key.go and preauthkeys.go.

Move the Error constant-error type from debug.go to mockoidc.go,
its only consumer.
2026-02-20 11:42:07 +01:00
Kristoffer Dalby
ca321d3c13 cmd/headscale/cli: use HeadscaleDateTimeFormat and util.Base10 consistently
Replace five hardcoded "2006-01-02 15:04:05" strings with the
HeadscaleDateTimeFormat constant already defined in utils.go.
Replace two literal 10 base arguments to strconv.FormatUint with
util.Base10 to match the convention in api_key.go and nodes.go.
2026-02-20 11:42:07 +01:00
Kristoffer Dalby
2765fd397f cmd/headscale/cli: drop dead flag-read error checks
Flags registered on a cobra.Command cannot fail to read at runtime;
GetString/GetUint64/GetStringSlice only error when the flag name is
unknown. The error-handling blocks for these calls are unreachable
dead code.

Adopt the value, _ := pattern already used in api_key.go,
preauthkeys.go and users.go, removing ~40 lines of dead code from
nodes.go and debug.go.
2026-02-20 11:42:07 +01:00
Kristoffer Dalby
d72a06c6c6 cmd/headscale/cli: remove legacy namespace and machine aliases
The --namespace flag on nodes list/register and debug create-node was
never wired to the --user flag, so its value was silently ignored.
Remove it along with the deprecateNamespaceMessage constant.

Also remove the namespace/ns command aliases on users and
machine/machines aliases on nodes, which have been deprecated since
the naming changes in 0.23.0.
2026-02-20 11:42:07 +01:00
Kristoffer Dalby
e816397d54 cmd/headscale/cli: remove no-op Args functions from serveCmd and dumpConfigCmd
These functions unconditionally return nil, which is the default cobra
behavior when Args is not set.
2026-02-20 11:42:07 +01:00
Kristoffer Dalby
22fccae125 cmd/headscale/cli: deduplicate expiration parsing and api-key flag validation
Add expirationFromFlag helper that parses the --expiration flag into a
timestamppb.Timestamp, replacing identical duration-parsing blocks in
api_key.go and preauthkeys.go.

Add apiKeyIDOrPrefix helper to validate the mutually-exclusive --id and
--prefix flags, replacing the duplicated switch block in expireAPIKeyCmd
and deleteAPIKeyCmd.
2026-02-20 11:42:07 +01:00
Kristoffer Dalby
6c08b49d63 cmd/headscale/cli: add confirmAction helper for force/prompt patterns
Centralise the repeated force-flag-check + YesNo-prompt logic into a
single confirmAction(cmd, prompt) helper.  Callers still decide what
to return on decline (error, message, or nil).
2026-02-20 11:42:07 +01:00
Kristoffer Dalby
7b7b270126 cmd/headscale/cli: add mustMarkRequired helper for init-time flag validation
Replace three inconsistent MarkFlagRequired error-handling styles
(stdlib log.Fatal, zerolog log.Fatal, silently discarded) with a
single mustMarkRequired helper that panics on programmer error.

Also fixes a bug where renameNodeCmd.MarkFlagRequired("new-name")
targeted the wrong command (should be renameUserCmd), making the
--new-name flag effectively never required on "headscale users rename".
2026-02-20 11:42:07 +01:00
Kristoffer Dalby
d6c39e65a5 cmd/headscale/cli: add printListOutput to centralise table-vs-JSON branching
Add a helper that checks the --output flag and either serialises as
JSON/YAML or invokes a table-rendering callback. This removes the
repeated format,_ := cmd.Flags().GetString("output") + if-branch from
the five list commands.
2026-02-20 11:42:07 +01:00
Kristoffer Dalby
8891ec9835 cmd/headscale/cli: remove deprecated output, SuccessOutput, ErrorOutput
All callers now use formatOutput/printOutput (non-exiting) with
RunE error returns, so the old os.Exit-based helpers are dead code.
2026-02-20 11:42:07 +01:00
Kristoffer Dalby
095106f498 cmd/headscale/cli: convert remaining commands to RunE
Convert the 10 commands that were still using Run with
ErrorOutput/SuccessOutput or log.Fatal/os.Exit:

- backfillNodeIPsCmd: use grpcRunE-style manual connection with
  error returns; simplify the confirm/force logic
- getPolicy, setPolicy, checkPolicy: replace ErrorOutput with
  fmt.Errorf returns in both the bypass-gRPC and gRPC paths
- serveCmd, configTestCmd: replace log.Fatal with error returns
- mockOidcCmd: replace log.Error+os.Exit with error return
- versionCmd, generatePrivateKeyCmd: replace SuccessOutput with
  printOutput
- dumpConfigCmd: return the error instead of swallowing it
2026-02-20 11:42:07 +01:00
Kristoffer Dalby
e4fe216e45 cmd/headscale/cli: switch to RunE with grpcRunE and error returns
Rename grpcRun to grpcRunE: the inner closure now returns error
and the wrapper returns a cobra RunE-compatible function.

Change newHeadscaleCLIWithConfig to return an error instead of
calling log.Fatal/os.Exit, making connection failures propagate
through the normal error path.

Add formatOutput (returns error) and printOutput (writes to stdout)
as non-exiting replacements for the old output/SuccessOutput pair.
Extract output format string literals into package-level constants.
Mark the old ErrorOutput, SuccessOutput and output helpers as
deprecated; they remain temporarily for the unconverted commands.

Convert all 22 grpcRunE commands from Run+ErrorOutput+SuccessOutput
to RunE+fmt.Errorf+printOutput. Change usernameAndIDFromFlag to
return an error instead of calling ErrorOutput directly.

Update backfillNodeIPsCmd and policy.go callers of
newHeadscaleCLIWithConfig for the new 5-return signature while
keeping their Run-based pattern for now.
2026-02-20 11:42:07 +01:00
Kristoffer Dalby
e6546b2cea cmd/headscale/cli: silence cobra error/usage output and centralise error formatting
Set SilenceErrors and SilenceUsage on the root command so that
cobra never prints usage text for runtime errors. A SetFlagErrorFunc
callback re-enables usage output specifically for flag-parsing
errors (the kubectl pattern).

Add printError to utils.go and switch Execute() to ExecuteC() so
the returned error is formatted as JSON/YAML when --output requests
machine-readable output.
2026-02-20 11:42:07 +01:00
Kristoffer Dalby
aae2f7de71 cmd/headscale/cli: add grpcRun wrapper for gRPC client lifecycle
Add a grpcRun helper that wraps cobra RunFuncs, injecting a ready
gRPC client and context. The connection lifecycle (cancel, close)
is managed by the wrapper, eliminating the duplicated 3-line
boilerplate (newHeadscaleCLIWithConfig + defer cancel + defer
conn.Close) from 22 command handlers across 7 files.

Three call sites are intentionally left unconverted:
- backfillNodeIPsCmd: creates the client only after user confirmation
- getPolicy/setPolicy: conditionally use gRPC vs direct DB access
2026-02-20 11:42:07 +01:00
Florian Preinstorfer
cfb308b4a7 Add FAQ entry to migrate back to default IP prefixes 2026-02-19 17:16:40 +01:00
Florian Preinstorfer
4bb0241257 Require to update from one version to the next 2026-02-19 17:16:40 +01:00
Florian Preinstorfer
513544cc11 Simplify upgrade snippet with a link to the upgrade guide
Remove some duplicated text.
2026-02-19 17:16:40 +01:00
Florian Preinstorfer
d556df1c36 Extend upgrade guide with backup instructions 2026-02-19 17:16:40 +01:00
Kristoffer Dalby
d15ec28799 ci: pin Docker to v28 to avoid v29 breaking changes
Docker 29 (shipped with runner-images 20260209.23.1) breaks docker
build via Go client libraries (broken pipe writing build context)
and docker load/save with certain tarball formats. Add Docker's
official apt repository and install docker-ce 28.5.x in all CI
jobs that interact with Docker.

See https://github.com/actions/runner-images/issues/13474

Updates #3058
2026-02-19 08:21:23 +01:00
Kristoffer Dalby
eccf64eb58 all: fix staticcheck SA4006 in types_test.go
Use new(users["name"]) instead of extracting to intermediate
variables that staticcheck does not recognise as used with
Go 1.26 new(value) syntax.

Updates #3058
2026-02-19 08:21:23 +01:00
Kristoffer Dalby
43afeedde2 all: apply golangci-lint 2.9.0 fixes
Fix issues found by the upgraded golangci-lint:
- wsl_v5: add required whitespace in CLI files
- staticcheck SA4006: replace new(var.Field) with &localVar
  pattern since staticcheck does not recognize Go 1.26
  new(value) as a use of the variable
- staticcheck SA5011: use t.Fatal instead of t.Error for
  nil guard checks so execution stops
- unused: remove dead ptrTo helper function
2026-02-19 08:21:23 +01:00
Kristoffer Dalby
73613d7f53 db: fix database_versions table creation for PostgreSQL
Use GORM AutoMigrate instead of raw SQL to create the
database_versions table, since PostgreSQL does not support the
datetime type used in the raw SQL (it requires timestamp).
2026-02-19 08:21:23 +01:00
Kristoffer Dalby
30d18575be CHANGELOG: document strict version upgrade path 2026-02-19 08:21:23 +01:00
Kristoffer Dalby
70f8141abd all: upgrade from Go 1.26rc2 to Go 1.26.0 2026-02-19 08:21:23 +01:00
Kristoffer Dalby
82958835ce db: enforce strict version upgrade path
Add a version check that runs before database migrations to ensure
users do not skip minor versions or downgrade. This protects database
migrations and allows future cleanup of old migration code.

Rules enforced:
- Same minor version: always allowed (patch changes either way)
- Single minor upgrade (e.g. 0.27 -> 0.28): allowed
- Multi-minor upgrade (e.g. 0.25 -> 0.28): blocked with guidance
- Any minor downgrade: blocked
- Major version change: blocked
- Dev builds: warn but allow, preserve stored version

The version is stored in a purpose-built database_versions table
after migrations succeed. The table is created with raw SQL before
gormigrate runs to avoid circular dependencies.

Updates #3058
2026-02-19 08:21:23 +01:00
Kristoffer Dalby
9c3a3c5837 flake: upgrade golangci-lint to 2.9.0 and update nixpkgs 2026-02-19 08:21:23 +01:00
Florian Preinstorfer
faf55f5e8f Document how to use the provider identifier in the policy 2026-02-18 10:24:05 +01:00
Florian Preinstorfer
e3323b65e5 Describe how to set username instead of SPN for Kanidm 2026-02-18 10:24:05 +01:00
Florian Preinstorfer
8f60b819ec Refresh update path 2026-02-16 15:22:46 +01:00
Florian Preinstorfer
c29bcd2eaf Release planning happens in milestones 2026-02-16 15:22:46 +01:00
Florian Preinstorfer
890a044ef6 Add more UIs 2026-02-16 15:22:46 +01:00
Florian Preinstorfer
8028fa5483 No longer consider autogroup:self experimental 2026-02-16 15:22:46 +01:00
Kristoffer Dalby
a7f981e30e github: fix needs-more-info label race condition
Replace tiangolo/issue-manager with custom logic that distinguishes
bot comments from human responses. The issue-manager action treated
all comments equally, so the bot's own instruction comment would
trigger label removal on the next scheduled run.

Split into two jobs:
- remove-label-on-response: triggers on issue_comment from non-bot
  users, removes the needs-more-info label immediately
- close-stale: runs on daily schedule, uses nushell to iterate open
  needs-more-info issues, checks for human comments after the label
  was added, and closes after 3 days with no response
2026-02-15 19:42:47 +01:00
Kristoffer Dalby
e0d8c3c877 github: fix needs-more-info label race condition
Remove the `issues: labeled` trigger from the timer workflow.

When both workflows triggered on label addition, the comment workflow
would post the bot comment, and by the time the timer workflow ran,
issue-manager would see "a comment was added after the label" and
immediately remove the label due to `remove_label_on_comment: true`.

The timer workflow now only runs on:
- Daily cron (to close stale issues)
- issue_comment (to remove label when humans respond)
- workflow_dispatch (for manual testing)
2026-02-09 10:03:12 +01:00
Kristoffer Dalby
c1b468f9f4 github: update issue template contact links
Reorder contact links to show Discord first, then documentation.
Update Discord invite link and docs URL to current values.
2026-02-09 09:51:28 +01:00
Kristoffer Dalby
900f4b7b75 github: add support-request automation workflow
Add workflow that automatically closes issues labeled as
support-request with a message directing users to Discord
for configuration and support questions.

The workflow:
- Triggers when support-request label is added
- Posts a comment explaining this tracker is for bugs/features
- Links to documentation and Discord
- Closes the issue as "not planned"
2026-02-09 09:51:28 +01:00
Kristoffer Dalby
64f23136a2 github: add needs-more-info automation workflow
Add GitHub Actions automation that helps manage issues requiring
additional information from reporters:

- Post an instruction comment when 'needs-more-info' label is added,
  requesting environment details, debug logs from multiple nodes,
  configuration files, and proper formatting
- Automatically remove the label when anyone comments
- Close the issue after 3 days if no response is provided
- Exempt needs-more-info labeled issues from the stale bot

The instruction comment includes guidance on:
- Required environment and debug information
- Collecting logs from both connecting and connected-to nodes
- Proper redaction rules (replace consistently, never remove IPs)
- Formatting requirements for attachments and Markdown
- Encouragement to discuss on Discord before filing issues
2026-02-09 09:51:28 +01:00
Kristoffer Dalby
0f6d312ada all: upgrade to Go 1.26rc2 and modernize codebase
This commit upgrades the codebase from Go 1.25.5 to Go 1.26rc2 and
adopts new language features.

Toolchain updates:
- go.mod: go 1.25.5 → go 1.26rc2
- flake.nix: buildGo125Module → buildGo126Module, go_1_25 → go_1_26
- flake.nix: build golangci-lint from source with Go 1.26
- Dockerfile.integration: golang:1.25-trixie → golang:1.26rc2-trixie
- Dockerfile.tailscale-HEAD: golang:1.25-alpine → golang:1.26rc2-alpine
- Dockerfile.derper: golang:alpine → golang:1.26rc2-alpine
- .goreleaser.yml: go mod tidy -compat=1.25 → -compat=1.26
- cmd/hi/run.go: fallback Go version 1.25 → 1.26rc2
- .pre-commit-config.yaml: simplify golangci-lint hook entry

Code modernization using Go 1.26 features:
- Replace tsaddr.SortPrefixes with slices.SortFunc + netip.Prefix.Compare
- Replace ptr.To(x) with new(x) syntax
- Replace errors.As with errors.AsType[T]

Lint rule updates:
- Add forbidigo rules to prevent regression to old patterns
2026-02-08 12:35:23 +01:00
Kristoffer Dalby
20dff82f95 CHANGELOG: add minimum Tailscale version for 0.29.0
Update the 0.29.0 changelog entry to document the minimum
supported Tailscale client version (v1.76.0), which corresponds
to capability version 106 based on the 10-version support window.
2026-02-07 08:23:51 +01:00
Kristoffer Dalby
31c4331a91 capver: regenerate from docker tags
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2026-02-07 08:23:51 +01:00
Kristoffer Dalby
ce580f8245 all: fix golangci-lint issues (#3064) 2026-02-06 21:45:32 +01:00
Kristoffer Dalby
bfb6fd80df integration: fixup test
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2026-02-06 07:40:29 +01:00
Kristoffer Dalby
3acce2da87 errors: rewrite errors to follow go best practices
Errors should not start capitalised and they should not contain the word error
or state that they "failed" as we already know it is an error

Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2026-02-06 07:40:29 +01:00
Kristoffer Dalby
4a9a329339 all: use lowercase log messages
Go style recommends that log messages and error strings should not be
capitalized (unless beginning with proper nouns or acronyms) and should
not end with punctuation.

This change normalizes all zerolog .Msg() and .Msgf() calls to start
with lowercase letters, following Go conventions and making logs more
consistent across the codebase.
2026-02-06 07:40:29 +01:00
Kristoffer Dalby
dd16567c52 hscontrol/state,db: use zf constants for logging
Replace raw string field names with zf constants in state.go and
db/node.go for consistent, type-safe logging.

state.go changes:
- User creation, hostinfo validation, node registration
- Tag processing during reauth (processReauthTags)
- Auth path and PreAuthKey handling
- Route auto-approval and MapRequest processing

db/node.go changes:
- RegisterNodeForTest logging
- Invalid hostname replacement logging
2026-02-06 07:40:29 +01:00
Kristoffer Dalby
e0a436cefc hscontrol/util/zlog/zf: add tag, authkey, and route constants
Add new zerolog field constants for improved logging consistency:

- Tag fields: CurrentTags, RemovedTags, RejectedTags, NewTags, OldTags,
  IsTagged, WasAuthKeyTagged
- Node fields: ExistingNodeID
- AuthKey fields: AuthKeyID, AuthKeyUsed, AuthKeyExpired, AuthKeyReusable,
  NodeKeyRotation
- Route fields: RoutesApprovedOld, RoutesApprovedNew, OldAnnouncedRoutes,
  NewAnnouncedRoutes, ApprovedRoutes, OldApprovedRoutes, NewApprovedRoutes,
  AutoApprovedRoutes, AllApprovedRoutes, RouteChanged
2026-02-06 07:40:29 +01:00
Kristoffer Dalby
53cdeff129 hscontrol/mapper: use sub-loggers and zf constants
Add sub-logger patterns to worker(), AddNode(), RemoveNode() and
multiChannelNodeConn to eliminate repeated field calls. Use zf.*
constants for consistent field naming.

Changes in batcher_lockfree.go:
- Add wlog sub-logger in worker() with worker.id context
- Add log field to multiChannelNodeConn struct
- Initialize mc.log with node.id in newMultiChannelNodeConn()
- Add nlog sub-loggers in AddNode() and RemoveNode()
- Update all connection methods to use mc.log

Changes in batcher.go:
- Use zf.NodeID and zf.Reason in handleNodeChange()
2026-02-06 07:40:29 +01:00
Kristoffer Dalby
7148a690d0 hscontrol/grpcv1: use EmbedObject and zf constants
Replace manual field extraction with EmbedObject for node logging
in gRPC handlers. Use zf.* constants for consistent field naming.

Changes:
- RegisterNode: use EmbedObject(node), zf.RegistrationKey, etc.
- SetTags: use EmbedObject(node)
- ExpireNode: use EmbedObject(node), zf.ExpiresAt
- RenameNode: use EmbedObject(node), zf.NewName
- SetApprovedRoutes: use zf.NodeID
2026-02-06 07:40:29 +01:00
Kristoffer Dalby
4e73133b9f hscontrol/routes: use sub-logger and zf constants
Add sub-logger pattern to SetRoutes() to eliminate repeated node.id
field calls. Replace raw strings with zf.* constants throughout
the primary routes code for consistent field naming.

Changes:
- Add nlog sub-logger in SetRoutes() with node.id context
- Replace "prefix" with zf.Prefix
- Replace "changed" with zf.Changes
- Replace "newState" with zf.NewState
- Replace "finalState" with zf.FinalState
2026-02-06 07:40:29 +01:00
Kristoffer Dalby
4f8724151e hscontrol/poll: use sub-logger pattern for mapSession
Replace the helper functions (logf, infof, tracef, errf) with a
zerolog sub-logger initialized in newMapSession(). The sub-logger
is pre-populated with session context (component, node, omitPeers,
stream) eliminating repeated field calls throughout the code.

Changes:
- Add log field to mapSession struct
- Initialize sub-logger with EmbedObject(node) and request context
- Remove logf/infof/tracef/errf helper functions
- Update all callers to use m.log.Level().Caller()... pattern
- Update noise.go to use sess.log instead of sess.tracef

This reduces code by ~20 lines and eliminates ~15 repeated field
calls per log statement.
2026-02-06 07:40:29 +01:00
Kristoffer Dalby
91730e2a1d hscontrol: use EmbedObject for node logging
Replace manual Uint64("node.id")/Str("node.name") field patterns with
EmbedObject(node) which automatically includes all standard node fields
(id, name, machine key, node key, online status, tags, user).

This reduces code repetition and ensures consistent logging across:
- state.go: Connect/Disconnect, persistNodeToDB, AutoApproveRoutes
- auth.go: handleLogout, handleRegisterWithAuthKey
2026-02-06 07:40:29 +01:00
Kristoffer Dalby
b5090a01ec cmd: use zf constants for zerolog field names
Update CLI logging to use zf.* constants instead of inline strings
for consistency with the rest of the codebase.
2026-02-06 07:40:29 +01:00
Kristoffer Dalby
27f5641341 golangci: add forbidigo rule for zerolog field constants
Add a lint rule to enforce use of zf.* constants for zerolog field
names instead of inline string literals. This catches at lint time
any new code that doesn't follow the convention.

The rule matches common zerolog field methods (Str, Int, Bool, etc.)
and flags any usage with a string literal first argument.
2026-02-06 07:40:29 +01:00
Kristoffer Dalby
cf3d30b6f6 types: add MarshalZerologObject to domain types
Implement zerolog.LogObjectMarshaler interface on domain types
for structured logging:

- Node: logs node.id, node.name, machine.key (short), node.key (short),
  node.is_tagged, node.expired, node.online, node.tags, user.name
- User: logs user.id, user.name, user.display, user.provider
- PreAuthKey: logs pak.id, pak.prefix (masked), pak.reusable,
  pak.ephemeral, pak.used, pak.is_tagged, pak.tags
- APIKey: logs api_key.id, api_key.prefix (masked), api_key.expiration

Security: PreAuthKey and APIKey only log masked prefixes, never full
keys or hashes. Uses zf.* constants for consistent field naming.
2026-02-06 07:40:29 +01:00
Kristoffer Dalby
58020696fe zlog: add utility package for safe and consistent logging
Add hscontrol/util/zlog package with:

- zf subpackage: field name constants for compile-time safety
- SafeHostinfo: wrapper that redacts device fingerprinting data
- SafeMapRequest: wrapper that redacts client endpoints

The zf (zerolog fields) subpackage provides short constant names
(e.g., zf.NodeID instead of inline "node.id" strings) ensuring
consistent field naming across all log statements.

Security considerations:
- SafeHostinfo never logs: OSVersion, DeviceModel, DistroName
- SafeMapRequest only logs endpoint counts, not actual IPs
2026-02-06 07:40:29 +01:00
Kristoffer Dalby
e44b402fe4 integration: update TestSubnetRouteACL for filter merging and IPProto
Update integration test expectations to match current policy behavior:

1. IPProto defaults include all four protocols (TCP, UDP, ICMPv4,
   ICMPv6) for port-range ACL rules, not just TCP and UDP.

2. Filter rules with identical SrcIPs and IPProto are now merged
   into a single rule with combined DstPorts, so the subnet router
   receives one filter rule instead of two.

Updates #3036
2026-02-05 19:29:16 +01:00
Kristoffer Dalby
835b7eb960 policy: autogroup:internet does not generate packet filters
According to Tailscale SaaS behavior, autogroup:internet is handled
by exit node routing via AllowedIPs, not by packet filtering. ACL
rules with autogroup:internet as destination should produce no
filter rules for any node.

Previously, Headscale expanded autogroup:internet to public CIDR
ranges and distributed filters to exit nodes (because 0.0.0.0/0
"covers" internet destinations). This was incorrect.

Add detection for AutoGroupInternet in filter compilation to skip
filter generation for this autogroup. Update test expectations
accordingly.
2026-02-05 19:29:16 +01:00
Kristoffer Dalby
95b1fd636e policy: fix wildcard DstPorts format and proto:icmp handling
Fix two compatibility issues discovered in Tailscale SaaS testing:

1. Wildcard DstPorts format: Headscale was expanding wildcard
   destinations to CGNAT ranges (100.64.0.0/10, fd7a:115c:a1e0::/48)
   while Tailscale uses {IP: "*"} directly. Add detection for
   wildcard (Asterix) alias type in filter compilation to use the
   correct format.

2. proto:icmp handling: The "icmp" protocol name was returning both
   ICMPv4 (1) and ICMPv6 (58), but Tailscale only returns ICMPv4.
   Users should use "ipv6-icmp" or protocol number 58 explicitly
   for IPv6 ICMP.

Update all test expectations accordingly. This significantly reduces
test file line count by replacing duplicated CGNAT range patterns
with single wildcard entries.
2026-02-05 19:29:16 +01:00
Kristoffer Dalby
834ac27779 policy/v2: add subnet routes and exit node compatibility tests
Add comprehensive test file for validating Headscale's ACL engine
behavior for subnet routes and exit nodes against documented
Tailscale SaaS behavior.

Tests cover:
- Category A: Subnet route basics (wildcard includes routes, tag-based
  ACL excludes routes)
- Category B: Exit node behavior (exit routes not in SrcIPs)
- Category F: Filter placement rules (filters on destination nodes)
- Category G: Protocol and port restrictions
- Category R: Route coverage rules
- Category O: Overlapping routes
- Category H: Edge cases (wildcard formats, CGNAT handling)
- Category T: Tag resolution (tags resolve to node IPs only)
- Category I: IPv6 specific behavior

The tests document expected Tailscale SaaS behavior with TODOs marking
areas where Headscale currently differs. This provides a baseline for
compatibility improvements.
2026-02-05 19:29:16 +01:00
Kristoffer Dalby
4a4032a4b0 changelog: document filter rule merging
Updates #3036
2026-02-05 19:29:16 +01:00
Kristoffer Dalby
29aa08df0e policy: update test expectations for merged filter rules
Update test expectations across policy tests to expect merged
FilterRule entries instead of separate ones. Tests now expect:
- Single FilterRule with combined DstPorts for same source
- Reduced matcher counts for exit node tests

Updates #3036
2026-02-05 19:29:16 +01:00
Kristoffer Dalby
0b1727c337 policy: merge filter rules with identical SrcIPs and IPProto
Tailscale merges multiple ACL rules into fewer FilterRule entries
when they have identical SrcIPs and IPProto, combining their DstPorts
arrays. This change implements the same behavior in Headscale.

Add mergeFilterRules() which uses O(n) hash map lookup to merge rules
with identical keys. DstPorts are NOT deduplicated to match Tailscale
behavior.

Also fix DestsIsTheInternet() to handle merged filter rules where
TheInternet is combined with other destinations - now uses superset
check instead of equality check.

Updates #3036
2026-02-05 19:29:16 +01:00
Kristoffer Dalby
08fe2e4d6c policy: use CIDR format for autogroup:self destinations
Updates #3036
2026-02-05 19:29:16 +01:00
Kristoffer Dalby
cb29cade46 docs: add compatibility test documentation
Updates #3036
2026-02-05 19:29:16 +01:00
Kristoffer Dalby
f27298c759 changelog: document wildcard CGNAT range change
Add breaking change entry for the wildcard resolution change to use
CGNAT/ULA ranges instead of all IPs.
Updates #3036

Updates #3036
2026-02-05 19:29:16 +01:00
Kristoffer Dalby
8baa14ef4a policy: use CGNAT/ULA ranges for wildcard resolution
Change Asterix.Resolve() to use Tailscale's CGNAT range (100.64.0.0/10)
and ULA range (fd7a:115c:a1e0::/48) instead of all IPs (0.0.0.0/0 and
::/0).
This better matches Tailscale's security model where wildcard (*) means
"any node in the tailnet" rather than literally "any IP address on the
internet".
Updates #3036

Updates #3036
2026-02-05 19:29:16 +01:00
Kristoffer Dalby
ebdbe03639 policy: validate autogroup:self sources in ACL rules
Tailscale validates that autogroup:self destinations in ACL rules can
only be used when ALL sources are users, groups, autogroup:member, or
wildcard (*). Previously, Headscale only performed this validation for
SSH rules.
Add validateACLSrcDstCombination() to enforce that tags, autogroup:tagged,
hosts, and raw IPs cannot be used as sources with autogroup:self
destinations. Invalid policies like `tag:client → autogroup:self:*` are
now rejected at validation time, matching Tailscale behavior.
Wildcard (*) is allowed because autogroup:self evaluation narrows it
per-node to only the node's own IPs.

Updates #3036
2026-02-05 19:29:16 +01:00
Kristoffer Dalby
f735502eae policy: add ICMP protocols to default and export constants
When ACL rules don't specify a protocol, Headscale now defaults to
[TCP, UDP, ICMP, ICMPv6] instead of just [TCP, UDP], matching
Tailscale's behavior.
Also export protocol number constants (ProtocolTCP, ProtocolUDP, etc.)
for use in external test packages, renaming the string protocol
constants to ProtoNameTCP, ProtoNameUDP, etc. to avoid conflicts.
This resolves 78 ICMP-related TODOs in the Tailscale compatibility
tests, reducing the total from 165 to 87.

Updates #3036
2026-02-05 19:29:16 +01:00
Kristoffer Dalby
53d17aa321 policy: add comprehensive Tailscale ACL compatibility tests
Add extensive test coverage verifying Headscale's ACL policy behavior
matches Tailscale's coordination server. Tests cover:
- Source/destination resolution for users, groups, tags, hosts, IPs
- autogroup:member, autogroup:tagged, autogroup:self behavior
- Filter rule deduplication and merging semantics
- Multi-rule interaction patterns
- Error case validation
Key behavioral differences documented:
- Headscale creates separate filter entries per ACL rule; Tailscale
  merges rules with identical sources
- Headscale deduplicates Dsts within a rule; Tailscale does not
- Headscale does not validate autogroup:self source restrictions for
  ACL rules (only SSH rules); Tailscale rejects invalid sources
Tests are based on real Tailscale coordination server responses
captured from a test environment with 5 nodes (1 user-owned, 4 tagged).

Updates #3036
2026-02-05 19:29:16 +01:00
Kristoffer Dalby
14f833bdb9 policy: fix autogroup:self handling for tagged nodes
Skip autogroup:self destination processing for tagged nodes since they
can never match autogroup:self (which only applies to user-owned nodes).
Also reorder the IsTagged() check to short-circuit before accessing
User() to avoid potential nil pointer access on tagged nodes.

Updates #3036
2026-02-05 19:29:16 +01:00
Florian Preinstorfer
9e50071df9 Link Fosdem 2026 talk 2026-02-05 08:01:02 +01:00
Florian Preinstorfer
c907b0d323 Fix version in mkdocs 2026-02-05 08:01:02 +01:00
Kristoffer Dalby
97fa117c48 changelog: set 0.28 date
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2026-02-04 21:26:22 +01:00
Kristoffer Dalby
b5329ff0f3 flake.lock: update nixpkgs to 2026-02-03 2026-02-04 20:18:46 +01:00
Kristoffer Dalby
eac8a57bce flake.nix: update hashes for dependency changes
Update vendorHash for headscale after Go module dependency updates.
Update grpc-gateway from v2.27.4 to v2.27.7 with new source and
vendor hashes.
2026-02-04 20:18:46 +01:00
Kristoffer Dalby
44af046196 all: update Go module dependencies
Update all direct and indirect Go module dependencies to their latest
compatible versions.

Notable direct dependency updates:
- tailscale.com v1.94.0 → v1.94.1
- github.com/coreos/go-oidc/v3 v3.16.0 → v3.17.0
- github.com/grpc-ecosystem/grpc-gateway/v2 v2.27.4 → v2.27.7
- github.com/puzpuzpuz/xsync/v4 v4.3.0 → v4.4.0
- golang.org/x/crypto v0.46.0 → v0.47.0
- golang.org/x/net v0.48.0 → v0.49.0
- google.golang.org/genproto updated to 2025-02-03

Notable indirect dependency updates:
- AWS SDK v2 group updated
- OpenTelemetry v1.39.0 → v1.40.0
- github.com/jackc/pgx/v5 v5.7.6 → v5.8.0
- github.com/gaissmai/bart v0.18.0 → v0.26.1

Add lockstep comment for gvisor.dev/gvisor noting it must be updated
together with tailscale.com, similar to the existing modernc.org/sqlite
comment.
2026-02-04 20:18:46 +01:00
Kristoffer Dalby
4a744f423b changelog: change api key format
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2026-02-04 20:18:46 +01:00
Kristoffer Dalby
ca75e096e6 integration: add test for tagged→user-owned conversion panic
Add TestTagsAuthKeyConvertToUserViaCLIRegister that reproduces the
exact panic from #3038: register a node with a tags-only PreAuthKey
(no user), force reauth with empty tags, then register via CLI with
a user. The mapper panics on node.Owner().Model().ID when User is nil.

The critical detail is using a tags-only PreAuthKey (User: nil). When
the key is created under a user, the node inherits the User pointer
from createAndSaveNewNode and the bug is masked.

Also add Owner() validity assertions to the existing unit test
TestTaggedNodeWithoutUserToDifferentUser to catch the nil pointer
at the unit test level.

Updates #3038
2026-02-04 15:44:55 +01:00
Kristoffer Dalby
ce7c256d1e state: set User pointer during tagged→user-owned conversion
processReauthTags sets UserID when converting a tagged node to
user-owned, but does not set the User pointer. When the node was
registered with a tags-only PreAuthKey (User: nil), the in-memory
NodeStore cache holds a node with User=nil. The mapper's
generateUserProfiles then calls node.Owner().Model().ID, which
dereferences the nil pointer and panics.

Set node.User alongside node.UserID in processReauthTags. Also add
defensive nil checks in generateUserProfiles to gracefully handle
nodes with invalid owners rather than panicking.

Fixes #3038
2026-02-04 15:44:55 +01:00
Kristoffer Dalby
4912ceaaf5 state: inline reauthExistingNode and convertTaggedNodeToUser
These were thin wrappers around applyAuthNodeUpdate that only added
logging. Move the logging into applyAuthNodeUpdate and call it directly
from HandleNodeFromAuthPath.

This simplifies the code structure without changing behavior.

Updates #3038
2026-02-04 15:44:55 +01:00
Kristoffer Dalby
d7f7f2c85e state: validate tags before UpdateNode to ensure consistency
Move tag validation before the UpdateNode callback in applyAuthNodeUpdate.
Previously, tag validation happened inside the callback, and the error
check occurred after UpdateNode had already committed changes to the
NodeStore. This left the NodeStore in an inconsistent state when tags
were rejected.

Now validation happens first, and UpdateNode is only called when we know
the operation will succeed. This follows the principle that UpdateNode
should only be called when we have all information and are ready to commit.

Also extract validateRequestTags as a reusable function and use it in
createAndSaveNewNode to deduplicate the tag validation logic.

Updates #3038
Updates #3048
2026-02-04 15:44:55 +01:00
Kristoffer Dalby
df184e5276 state: fix expiry handling during node tag conversion
Previously, expiry handling ran BEFORE processReauthTags(), using the
old tagged status to determine whether to set/clear expiry. This caused:

- Personal → Tagged: Expiry remained set (should be cleared to nil)
- Tagged → Personal: Expiry remained nil (should be set from client)

Move expiry handling after tag processing and handle all four transition
cases based on the new tagged status:

- Tagged → Personal: Set expiry from client request
- Personal → Tagged: Clear expiry (tagged nodes don't expire)
- Personal → Personal: Update expiry from client
- Tagged → Tagged: Keep existing nil expiry

Fixes #3048
2026-02-04 15:44:55 +01:00
Kristoffer Dalby
0630fd32e5 state: refactor HandleNodeFromAuthPath for clarity
Reorganize HandleNodeFromAuthPath (~300 lines) into a cleaner structure
with named conditions and extracted helper functions.

Changes:
- Add authNodeUpdateParams struct for shared update logic
- Extract applyAuthNodeUpdate for common reauth/convert operations
- Extract reauthExistingNode and convertTaggedNodeToUser handlers
- Extract createNewNodeFromAuth for new node creation
- Use named boolean conditions (nodeExistsForSameUser, existingNodeIsTagged,
  existingNodeOwnedByOtherUser) instead of compound if conditions
- Create logger with common fields (registration_id, user.name, machine.key,
  method) to reduce log statement verbosity

Updates #3038
2026-02-04 15:44:55 +01:00
Kristoffer Dalby
306aabbbce state: fix nil pointer panic when re-registering tagged node without user
When a node was registered with a tags-only PreAuthKey (no user
associated), the node had User=nil and UserID=nil. When attempting to
re-register this node to a different user via HandleNodeFromAuthPath,
two issues occurred:

1. The code called oldUser.Name() without checking if oldUser was valid,
   causing a nil pointer dereference panic.

2. The existing node lookup logic didn't find the tagged node because it
   searched by (machineKey, userID), but tagged nodes have no userID.
   This caused a new node to be created instead of updating the existing
   tagged node.

Fix this by restructuring HandleNodeFromAuthPath to:
1. First check if a node exists for the same user (existing behavior)
2. If not found, check if an existing TAGGED node exists with the same
   machine key (regardless of userID)
3. If a tagged node exists, UPDATE it to convert from tagged to
   user-owned (preserving the node ID)
4. Only create a new node if the existing node is user-owned by a
   different user

This ensures consistent behavior between:
- personal → tagged → personal (same node, same owner)
- tagged (no user) → personal (same node, new owner)

Add a test that reproduces the panic and conversion scenario by:
1. Creating a tags-only PreAuthKey (no user)
2. Registering a node with that key
3. Re-registering the same machine to a different user
4. Verifying the node ID stays the same (conversion, not creation)

Fixes #3038
2026-02-04 15:44:55 +01:00
Kristoffer Dalby
a09b0d1d69 policy/v2: add Caller() to log statements in compileACLWithAutogroupSelf
Both compileFilterRules and compileSSHPolicy include .Caller() on
their resolution error log statements, but compileACLWithAutogroupSelf
does not. Add .Caller() to the three log sites (source resolution
error, destination resolution error, nil destination) for consistent
debuggability across all compilation paths.

Updates #2990
2026-02-03 16:53:15 +01:00
Kristoffer Dalby
362696a5ef policy/v2: keep partial IPSet on SSH destination resolution errors
In compileSSHPolicy, when resolving other (non-autogroup:self)
destinations, the code discards the entire result on error via
`continue`. If a destination alias (e.g., a tag owned by a group
with a non-existent user) returns a partial IPSet alongside an
error, valid IPs are lost.

Both ACL compilation paths (compileFilterRules and
compileACLWithAutogroupSelf) already handle this correctly by
logging the error and using the IPSet if non-nil.

Remove the `continue` so the SSH path is consistent with the
ACL paths.

Fixes #2990
2026-02-03 16:53:15 +01:00
Kristoffer Dalby
1f32c8bf61 policy/v2: add IsTagged() guards to prevent panics on tagged nodes
Three related issues where User().ID() is called on potentially tagged
nodes without first checking IsTagged():

1. compileACLWithAutogroupSelf: the autogroup:self block at line 166
   lacks the !node.IsTagged() guard that compileSSHPolicy already has.
   If a tagged node is the compilation target, node.User().ID() may
   panic. Tagged nodes should never participate in autogroup:self.

2. compileSSHPolicy: the IsTagged() check is on the right side of &&,
   so n.User().ID() evaluates first and may panic before short-circuit
   can prevent it. Swap to !n.IsTagged() && n.User().ID() == ... to
   match the already-correct order in compileACLWithAutogroupSelf.

3. invalidateAutogroupSelfCache: calls User().ID() at ~10 sites
   without IsTagged() guards. Tagged nodes don't participate in
   autogroup:self, so they should be skipped when collecting affected
   users and during cache lookup. Tag status transitions are handled
   by using the non-tagged version's user ID.

Fixes #2990
2026-02-03 16:53:15 +01:00
Kristoffer Dalby
fb137a8fe3 policy/v2: use partial IPSet on group resolution errors in autogroup:self path
In compileACLWithAutogroupSelf, when a group contains a non-existent
user, Group.Resolve() returns a partial IPSet (with IPs from valid
users) alongside an error. The code was discarding the entire result
via `continue`, losing valid IPs. The non-autogroup-self path
(compileFilterRules) already handles this correctly by logging the
error and using the IPSet if non-empty.

Remove the `continue` on error for both source and destination
resolution, matching the existing behavior in compileFilterRules.
Also reorder the IsTagged check before User().ID() comparison
in the same-user node filter to prevent nil dereference on tagged
nodes that have no User set.

Fixes #2990
2026-02-03 16:53:15 +01:00
Kristoffer Dalby
c2f28efbd7 policy/v2: add test for issue #2990 same-user tagged device
Add test reproducing the exact scenario from issue #2990 where:
- One user (user1) in group:admin
- node1: user device (not tagged)
- node2: tagged with tag:admin, same user

The test verifies that peer visibility and packet filters are correct.

Updates #2990
2026-02-03 16:53:15 +01:00
Kristoffer Dalby
11f0d4cfdd policy/v2: include nodes with empty filters in BuildPeerMap
Previously, nodes with empty filter rules (e.g., tagged servers that are
only destinations, never sources) were skipped entirely in BuildPeerMap.
This could cause visibility issues when using autogroup:self with
multiple user groups.

Remove the len(filter) == 0 skip condition so all nodes are included in
nodeMatchers. Empty filters result in empty matchers where CanAccess()
returns false, but the node still needs to be in the map so symmetric
visibility works correctly: if node A can access node B, both should see
each other regardless of B's filter rules.

Add comprehensive tests for:
- Multi-group scenarios where autogroup:self is used by privileged users
- Nodes with empty filters remaining visible to authorized peers
- Combined access rules (autogroup:self + tags in same rule)

Updates #2990
2026-02-03 16:53:15 +01:00
Florian Preinstorfer
5d300273dc Add a tags page and describe a few common operations 2026-01-28 15:52:57 +01:00
Florian Preinstorfer
7f003ecaff Add a page to describe supported registration methods 2026-01-28 15:52:57 +01:00
Florian Preinstorfer
2695d1527e Use registration key instead of machine key 2026-01-28 15:52:57 +01:00
Florian Preinstorfer
d32f6707f7 Add missing words 2026-01-28 15:52:57 +01:00
Florian Preinstorfer
89e436f0e6 Bump year/version for mkdocs 2026-01-28 15:52:57 +01:00
Kristoffer Dalby
46daa659e2 state: omit AuthKeyID/AuthKey in node Updates to prevent FK errors
When a PreAuthKey is deleted, the database correctly sets auth_key_id
to NULL on referencing nodes via ON DELETE SET NULL. However, the
NodeStore (in-memory cache) retains the old AuthKeyID value.

When nodes send MapRequests (e.g., after tailscaled restart), GORM's
Updates() tries to persist the stale AuthKeyID, causing a foreign key
constraint error when trying to reference a deleted PreAuthKey.

Fix this by adding AuthKeyID and AuthKey to the Omit() call in all
three places where nodes are updated via GORM's Updates():
- persistNodeToDB (MapRequest processing)
- HandleNodeFromAuthPath (re-auth via web/OIDC)
- HandleNodeFromPreAuthKey (re-registration with preauth key)

This tells GORM to never touch the auth_key_id column or AuthKey
association during node updates, letting the database handle the
foreign key relationship correctly.

Added TestDeletedPreAuthKeyNotRecreatedOnNodeUpdate to verify that
deleted PreAuthKeys are not recreated when nodes send MapRequests.
2026-01-26 12:12:11 +00:00
Florian Preinstorfer
49b70db7f2 Conversion from personal to tagged node is reversible 2026-01-24 17:18:59 +01:00
Florian Preinstorfer
04b4071888 Fix node expiration success message
A node is expired when the requested expiration is either now or in the
past.
2026-01-24 15:18:12 +01:00
Florian Preinstorfer
ee127edbf7 Remove trace log for preauthkeys create
This always prints a TRC message on `preauthkeys create`. Since we don't
print anything for `apikeys create` either we might as well remove it.
2026-01-23 08:40:09 +01:00
Kristoffer Dalby
606e5f68a0 changelog: fixups for 0.28.0-beta.2
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2026-01-22 08:33:41 +00:00
Kristoffer Dalby
a04b21abc6 gen: regenerate protobuf and type views
Regenerated with updated grpc-gateway and tailscale dependencies.
2026-01-21 19:17:10 +00:00
Kristoffer Dalby
92caadcee6 nix: update vendor hash for Go dependencies 2026-01-21 19:17:10 +00:00
Kristoffer Dalby
aa29fd95a3 derp: migrate to derpserver package API
tailscale.com v1.94.0 moved derp.Server to the derpserver subpackage.
Update imports and type references accordingly.
2026-01-21 19:17:10 +00:00
Kristoffer Dalby
0565e01c2f go.mod: update dependencies
Notable updates:
- tailscale.com v1.86.5 -> v1.94.0
- modernc.org/sqlite v1.39.1 -> v1.44.3
- modernc.org/libc v1.66.10 -> v1.67.6
- google.golang.org/grpc v1.75.1 -> v1.78.0
2026-01-21 19:17:10 +00:00
Kristoffer Dalby
aee1d2a640 nix: fix deprecated attributes and update dev tools
- Fix deprecated flake output attributes (overlay -> overlays.default,
  devShell -> devShells.default, defaultPackage -> packages.default)
- Use stdenv.hostPlatform.system instead of deprecated prev.system
- Update grpc-gateway 2.24.0 -> 2.27.4
- Update protobuf-language-server
- Update nixpkgs
2026-01-21 19:17:10 +00:00
Kristoffer Dalby
ee303186b3 docs: add changelog for SSH policy changes
Document breaking changes:
- Wildcard (*) no longer supported as SSH destination
- SSH source/destination validation enforces Tailscale's security model

Fixes #3009
Fixes #3010
2026-01-21 17:01:30 +00:00
Kristoffer Dalby
e9a94f00a9 integration: update SSH tests for validation rules
Update integration tests to use valid SSH patterns:

- TestSSHOneUserToAll: use autogroup:member and autogroup:tagged
  instead of wildcard destination
- TestSSHMultipleUsersAllToAll: use autogroup:self instead of
  username destinations for group-to-user SSH access

Updates #3009
Updates #3010
2026-01-21 17:01:30 +00:00
Kristoffer Dalby
d40203e153 policy: update tests for SSH validation rules
Update unit tests to use valid SSH patterns that conform to Tailscale's
security model:

- Change group->user destinations to group->tag
- Change tag->user destinations to tag->tag
- Update expected error messages for new validation format
- Add proper tagged/untagged node setup in filter tests

Updates #3009
Updates #3010
2026-01-21 17:01:30 +00:00
Kristoffer Dalby
5688c201e9 policy/v2: validate SSH source/destination combinations
Add validation for SSH source/destination combinations that enforces
Tailscale's security model:

- Tags/autogroup:tagged cannot SSH to user-owned devices
- autogroup:self destination requires source to contain only users/groups
- Username destinations require source to be that same single user only
- Wildcard (*) is no longer supported as SSH destination; use
  autogroup:member or autogroup:tagged instead

The validateSSHSrcDstCombination() function is called during policy
validation to reject invalid configurations at load time.

Fixes #3009
Fixes #3010
2026-01-21 17:01:30 +00:00
Shourya Gautam
4e1834adaf db: use PolicyManager for RequestTags migration
Refactor the RequestTags migration (202601121700-migrate-hostinfo-request-tags)
to use PolicyManager.NodeCanHaveTag() instead of reimplementing tag validation.

Changes:
- NewHeadscaleDatabase now accepts *types.Config to allow migrations
  access to policy configuration
- Add loadPolicyBytes helper to load policy from file or DB based on config
- Add standalone GetPolicy(tx *gorm.DB) for use during migrations
- Replace custom tag validation logic with PolicyManager

Benefits:
- Full HuJSON parsing support (not just JSON)
- Proper group expansion via PolicyManager
- Support for nested tags and autogroups
- Works with both file and database policy modes
- Single source of truth for tag validation


Co-Authored-By: Shourya Gautam <shouryamgautam@gmail.com>
2026-01-21 15:10:29 +01:00
Kristoffer Dalby
22afb2c61b policy: fix asymmetric peer visibility with autogroup:self
When autogroup:self was combined with other ACL rules (e.g., group:admin
-> *:*), tagged nodes became invisible to users who should have access.

The BuildPeerMap function had two code paths:
- Global filter path: used symmetric OR logic (if either can access, both
  see each other)
- Autogroup:self path: used asymmetric logic (only add peer if that
  specific direction has access)

This caused problems with one-way rules like admin -> tagged-server. The
admin could access the server, but since the server couldn't access the
admin, neither was added to the other's peer list.

Fix by using symmetric visibility in the autogroup:self path, matching
the global filter path behavior: if either node can access the other,
both should see each other as peers.

Credit: vdovhanych <vdovhanych@users.noreply.github.com>

Fixes #2990
2026-01-21 14:35:16 +01:00
Kristoffer Dalby
b3c4d0ec81 integration: add tests for API key expire/delete by ID
Extend TestApiKeyCommand to test the new --id flag for expire and
delete commands, verifying that API keys can be managed by their
database ID in addition to the existing --prefix method.

Updates #2986
2026-01-20 17:13:38 +01:00
Kristoffer Dalby
b82c9c9c0e docs: add changelog entry for API key expire/delete by ID
Fixes #2986
2026-01-20 17:13:38 +01:00
Kristoffer Dalby
e0bae9b769 cli: add --id flag to API key expire/delete commands
Add --id flag as an alternative to --prefix for expiring and
deleting API keys. This allows users to use the ID shown in
'headscale apikeys list' output, which is more convenient than
the prefix.

Either --id or --prefix must be provided; both flags are optional
but at least one is required.

Updates #2986
2026-01-20 17:13:38 +01:00
Kristoffer Dalby
a194712c34 grpc: support expire/delete API keys by ID
Update ExpireApiKey and DeleteApiKey handlers to accept either ID or
prefix for identifying the API key. Returns InvalidArgument error if
neither or both are provided.

Add tests for:
- Expire by ID
- Expire by prefix (backwards compatibility)
- Delete by ID
- Delete by prefix (backwards compatibility)
- Error when neither ID nor prefix provided
- Error when both ID and prefix provided

Updates #2986
2026-01-20 17:13:38 +01:00
Kristoffer Dalby
8776745428 gen: regenerate protobuf code
Updates #2986
2026-01-20 17:13:38 +01:00
Kristoffer Dalby
b01eda721c proto: add id field to API key expire/delete requests
Add id field to ExpireApiKeyRequest and DeleteApiKeyRequest messages.
This allows API keys to be expired or deleted by their database ID
in addition to the existing prefix-based lookup.

Updates #2986
2026-01-20 17:13:38 +01:00
Kristoffer Dalby
42bd9cd058 state: add GetAPIKeyByID method
Add GetAPIKeyByID method to the state layer, delegating to the existing
database layer function. This enables API key lookup by ID in addition
to the existing prefix-based lookup.

Updates #2986
2026-01-20 17:13:38 +01:00
Kristoffer Dalby
515a22e696 go.mod: remove gopkg.in/check.v1 dependency
Remove gopkg.in/check.v1 as a direct dependency now that all tests
have been migrated to testify.
2026-01-20 15:41:33 +01:00
Kristoffer Dalby
6654142fbe cmd/headscale: migrate tests from check.v1 to testify
Convert config loading tests from gopkg.in/check.v1 Suite-based testing
to standard Go tests with testify assert/require.

Changes:
- Remove Suite boilerplate (Test, Suite type, SetUpSuite, TearDownSuite)
- Convert TestConfigFileLoading and TestConfigLoading to standalone tests
- Replace check assertions with testify equivalents
2026-01-20 15:41:33 +01:00
Kristoffer Dalby
424e26d636 db: migrate tests from check.v1 to testify
Migrate all database tests from gopkg.in/check.v1 Suite-based testing
to standard Go tests with testify assert/require.

Changes:
- Remove empty Suite files (hscontrol/suite_test.go, hscontrol/mapper/suite_test.go)
- Convert hscontrol/db/suite_test.go to modern helpers only
- Convert 6 Suite test methods in node_test.go to standalone tests
- Convert 5 Suite test methods in api_key_test.go to standalone tests
- Fix stale global variable reference in db_test.go

The legacy TestListPeers Suite method was renamed to TestListPeersManyNodes
to avoid conflict with the existing modern TestListPeers function, as they
test different aspects (basic peer listing vs ID filtering).
2026-01-20 15:41:33 +01:00
Kristoffer Dalby
d9cbb96603 state: add unit test for DeleteUser change signal
Updates #2967
2026-01-20 15:41:19 +01:00
Kristoffer Dalby
c1cfb59b91 ci: add ACL unknown user tests to integration workflow
Updates #2967
2026-01-20 15:41:19 +01:00
Kristoffer Dalby
4be13baf3f state: update policy manager when deleting users
Make DeleteUser call updatePolicyManagerUsers() to refresh the policy
manager's cached user list after user deletion. This ensures consistency
with CreateUser, UpdateUser, and RenameUser which all update the policy
manager.

Previously, DeleteUser only removed the user from the database without
updating the policy manager. This could leave stale user references in
the cached user list, potentially causing issues when policy is
re-evaluated.

The gRPC handler now uses the change returned from DeleteUser instead of
manually constructing change.UserRemoved().

Fixes #2967
2026-01-20 15:41:19 +01:00
Kristoffer Dalby
98c0817b95 integration: add tests for ACL group with deleted/unknown users
Add DeleteUser method to ControlServer interface and implement it in
HeadscaleInContainer to enable testing user deletion scenarios.

Add two integration tests for issue #2967:
- TestACLGroupWithUnknownUser: tests that valid users can communicate
  when a group references a non-existent user
- TestACLGroupAfterUserDeletion: tests connectivity after deleting a
  user that was referenced in an ACL group

These tests currently pass but don't fully reproduce the reported issue
where deleted users break connectivity for the entire group.

Updates #2967
2026-01-20 15:41:19 +01:00
Kristoffer Dalby
951fd5a8e7 cli: show Owner column in preauthkeys list
Replace the Tags column with an Owner column that displays:
- Tags (newline-separated) if the key has ACL tags
- User name if the key is associated with a user
- Dash (-) if neither is present

This aligns the CLI output with the tags-as-identity model where
preauthkeys can be created with either tags or user ownership.
2026-01-20 12:53:20 +01:00
Kristoffer Dalby
b8f3e09046 integration: fix tags-only auth key tests
- Rename TestTagsAuthKeyWithoutUserIgnoresAdvertisedTags to
  TestTagsAuthKeyWithoutUserRejectsAdvertisedTags to reflect actual
  behavior (PreAuthKey registrations reject advertised tags)
- Fix TestTagsAuthKeyWithoutUserInheritsTags to use ListNodes() without
  user filter since tags-only nodes don't have a user association

Updates #2977
2026-01-20 12:53:20 +01:00
Kristoffer Dalby
4ab06930a2 hscontrol: handle tags-only PreAuthKeys in registration
HandleNodeFromPreAuthKey assumed pak.User was always set, but
tags-only PreAuthKeys have nil User. This caused nil pointer
dereference when registering nodes with tags-only keys.

Also updates integration tests to use GetTags() instead of the
removed GetValidTags() method.

Updates #2977
2026-01-20 12:53:20 +01:00
Kristoffer Dalby
165c5f0491 cli: fix preauthkeys expire/delete argument validation
The Args function incorrectly required positional arguments but
the commands use --id flag. Move validation into Run function.
2026-01-20 12:53:20 +01:00
Kristoffer Dalby
c8c3c9d4a0 hscontrol: allow CreatePreAuthKey without user when tags provided
Handle case where user is 0 in gRPC layer to support tags-only
auth keys.
2026-01-20 12:53:20 +01:00
Kristoffer Dalby
4dd1b49a35 integration: update CLI tests for ID-based preauthkey commands
Remove --user flag from list commands.
Change expire command to use --id flag instead of --user and key.
2026-01-20 12:53:20 +01:00
Kristoffer Dalby
db6882b5f5 integration: update DeleteAuthKey to use ID 2026-01-20 12:53:20 +01:00
Kristoffer Dalby
1325fd8b27 cli,hscontrol: use ID-based preauthkey operations 2026-01-20 12:53:20 +01:00
Kristoffer Dalby
8631581852 gen: regenerate proto code 2026-01-20 12:53:20 +01:00
Kristoffer Dalby
1398d01bd8 proto: change preauthkey API to ID-based operations
Remove user parameter from ListPreAuthKeys.
Change ExpirePreAuthKey and DeletePreAuthKey to use key ID.
2026-01-20 12:53:20 +01:00
Kristoffer Dalby
00da5361b3 integration: test tags-only auth key behavior
Add tests for auth keys without user ownership to verify tags from
key are used regardless of --advertise-tags flag.
2026-01-20 12:53:20 +01:00
Kristoffer Dalby
740d2b5a2c integration: support auth keys without user
Add AuthKeyOptions to create auth keys owned by tags only.
2026-01-20 12:53:20 +01:00
Kristoffer Dalby
3b4b9a4436 hscontrol: fix tag updates not propagating to node self view
When SetNodeTags changed a node's tags, the node's self view wasn't
updated. The bug manifested as: the first SetNodeTags call updates
the server but the client's self view doesn't update until a second
call with the same tag.

Root cause: Three issues combined to prevent self-updates:

1. SetNodeTags returned PolicyChange which doesn't set OriginNode,
   so the mapper's self-update check failed.

2. The Change.Merge function didn't preserve OriginNode, so when
   changes were batched together, OriginNode was lost.

3. generateMapResponse checked OriginNode only in buildFromChange(),
   but PolicyChange uses RequiresRuntimePeerComputation which
   bypasses that code path entirely and calls policyChangeResponse()
   instead.

The fix addresses all three:
- state.go: Set OriginNode on the returned change
- change.go: Preserve OriginNode (and TargetNode) during merge
- batcher.go: Pass isSelfUpdate to policyChangeResponse so the
  origin node gets both self info AND packet filters
- mapper.go: Add includeSelf parameter to policyChangeResponse

Fixes #2978
2026-01-20 10:13:47 +01:00
Kristoffer Dalby
1b6db34b93 integration/tags: add self-tag validation to existing tests
Update 8 tests that involve admin tag assignment via SetNodeTags()
to verify both server-side state and node self view updates:

- TestTagsAuthKeyWithTagAdminOverrideReauthPreserves
- TestTagsAuthKeyWithTagCLICannotModifyAdminTags
- TestTagsAuthKeyWithoutTagCLINoOpAfterAdminWithReset
- TestTagsAuthKeyWithoutTagCLINoOpAfterAdminWithEmptyAdvertise
- TestTagsAuthKeyWithoutTagCLICannotReduceAdminMultiTag
- TestTagsUserLoginCLINoOpAfterAdminAssignment
- TestTagsUserLoginCLICannotRemoveAdminTags
- TestTagsAdminAPICanSetUnownedTag

Each test now validates that tag updates propagate to the node's
own self view using assertNodeSelfHasTagsWithCollect, addressing
the issue #2978 scenario where tag changes were observed to
propagate to peers but not to the node itself.

Updates #2978
2026-01-20 10:13:47 +01:00
Kristoffer Dalby
07a4b1b1fd integration/tags: add dedicated issue #2978 reproduction test
Add TestTagsIssue2978ReproTagReplacement that specifically tests the
scenario from issue #2978:
- Register node with tag:foo via web auth with --advertise-tags
- Admin changes tag to tag:bar via SetNodeTags
- Verify client's self view updates (not just server-side)

The test performs multiple tag replacements with timing checks to
verify whether tag updates propagate to the node's self view after
the first call (fixed behavior) or only after a redundant second
call (bug behavior).

Add helper functions for test validation:
- assertNodeSelfHasTagsWithCollect: validates client's status.Self.Tags
- assertNetmapSelfHasTagsWithCollect: validates client's netmap.SelfNode.Tags

Updates #2978
2026-01-20 10:13:47 +01:00
Kristoffer Dalby
2e180d2587 integration: add test for reauth tag removal
Add TestTagsUserLoginReauthWithEmptyTagsRemovesAllTags to validate that
nodes can be untagged via `tailscale up --advertise-tags= --force-reauth`.

The test verifies:
- Node starts with tags and is owned by tagged-devices
- After reauth with empty tags, all tags are removed
- Node ownership returns to the authenticating user

Updates #2979
2026-01-17 10:13:24 +01:00
Kristoffer Dalby
0451dd4718 state: allow untagging nodes via reauth with empty RequestTags
When a node re-authenticates via OIDC/web auth with empty RequestTags
(from `tailscale up --advertise-tags= --force-reauth`), remove all tags
and return ownership to the authenticating user.

This allows nodes to transition from any tagged state (including nodes
originally registered with a tagged pre-auth key) back to user-owned.

Fixes #2979
2026-01-17 10:13:24 +01:00
Kristoffer Dalby
a6696582a4 util/dns: fix variable redeclaration in ValidateDNSName 2026-01-17 10:13:24 +01:00
Kristoffer Dalby
00f22a8443 state: disable key expiry for nodes with approved advertise-tags
Extends #2971 fix to also cover nodes that authenticate as users but
become tagged immediately via --advertise-tags. When RequestTags are
approved by policy, the node's expiry is now disabled, consistent with
nodes registered via tagged PreAuthKeys.
2026-01-16 17:05:59 +01:00
Kristoffer Dalby
1d9900273e state: disable key expiry for tagged nodes
Nodes registered with tagged PreAuthKeys now have key expiry disabled,
matching Tailscale's behavior. User-owned nodes continue to use the
client-requested expiry.

On re-authentication, tagged nodes preserve their disabled expiry while
user-owned nodes can update their expiry from the client request.

Fixes #2971
2026-01-16 17:05:59 +01:00
Florian Preinstorfer
18e13f6ffa Link to headscale.net for docs 2026-01-16 14:54:04 +01:00
Florian Preinstorfer
a445278f76 Mention tags on the features page 2026-01-16 14:54:04 +01:00
Florian Preinstorfer
8387c9cd82 Fix ownership description for auto approved routers/exits
Just the tags tag:router and tag:exit are owned by alice. Upon join,
those nodes will have their ownership transferred from alice to the
system user "tagged-devices".
2026-01-16 14:54:04 +01:00
Florian Preinstorfer
25a7434830 Bump version in mkdocs 2026-01-16 14:54:04 +01:00
Florian Preinstorfer
183a38715c Fix list-routes examples 2026-01-16 14:54:04 +01:00
Florian Preinstorfer
99d35fbbbc Document oidc.email_verification_required 2026-01-16 14:54:04 +01:00
Florian Preinstorfer
d50108c722 Changelog: mark oidc.email_verified_required as breaking
Headscale is now stricter and this is a breaking change if authorization
filters are used and at least one user has an unverified email address.
2026-01-16 14:54:04 +01:00
Florian Preinstorfer
6d21a4a3fe Document /version in the API docs 2026-01-16 14:54:04 +01:00
Florian Preinstorfer
7d81dca9aa Document how to disable the metrics interfaces 2026-01-16 14:54:04 +01:00
Kristoffer Dalby
3689f05407 types: use Username() in User.Proto() when Name is empty
User.Proto() was returning u.Name directly, which is empty for OIDC
users who have their identifier in the Email field instead. This caused
"headscale nodes list" to show empty user names for OIDC-authenticated
nodes.

Only fall back to Username() when Name is empty, which provides a
display-friendly identifier (Email > ProviderIdentifier > ID). This
ensures OIDC users display their email while CLI users retain their
original Name.

Fixes #2972
2026-01-14 13:16:51 +01:00
chen
bb30208f97 Add headscale-piying web UI to docs 2026-01-14 12:56:05 +01:00
Florian Preinstorfer
c3e2e57f8e Clarify autogroup:member
Fixes: #3007
2026-01-14 12:44:18 +01:00
Kristoffer Dalby
e43f19df79 CHANGELOG: add breaking change for Node API simplification 2026-01-14 09:32:46 +01:00
Kristoffer Dalby
0516c0ec37 gen: regenerate protobuf code 2026-01-14 09:32:46 +01:00
Kristoffer Dalby
eec54cbbf3 api/v1: replace ForcedTags/InvalidTags/ValidTags with Tags
Simplifies the API by exposing a single tags field instead of three
separate fields for forced, invalid, and valid tags. The distinction
between these was an internal implementation detail that should not
be exposed in the public API.

Marks fields 18-20 as reserved to prevent field number reuse.
2026-01-14 09:32:46 +01:00
Kristoffer Dalby
72fcb93ef3 cli: ensure tagged-devices is included in profile list (#2991) 2026-01-09 16:31:23 +01:00
Kristoffer Dalby
f5c779626a nix: use testers.nixosTest instead of nixosTest
nixosTest was renamed to testers.nixosTest in nixpkgs.
2026-01-09 12:57:32 +01:00
Kristoffer Dalby
d227b3a135 docs: update integration testing docs for concurrent execution
Update documentation to reflect the new concurrent test execution
capabilities and add guidance on run ID isolation.

AGENTS.md:
- Add examples for running multiple tests concurrently
- Document run ID format and container naming conventions
- Update "Critical Notes" to explain isolation mechanisms

.claude/agents/headscale-integration-tester.md:
- Add "Concurrent Execution and Run ID Isolation" section
- Document forbidden and safe operations for cleanup
- Add "Agent Session Isolation Rules" for multi-agent environments
- Add 6th core responsibility about concurrent execution awareness
- Add ISOLATION PRINCIPLE to critical principles
- Update pre-test cleanup documentation
2026-01-09 12:34:16 +01:00
Kristoffer Dalby
0bcfdc29ad cmd/hi: enable concurrent test execution
Remove the concurrent test prevention logic and update cleanup to use
run ID-based isolation, allowing multiple tests to run simultaneously.

Changes:
- cleanup: Add killTestContainersByRunID() to clean only containers
  belonging to a specific run, add cleanupStaleTestContainers() to
  remove only stopped/exited containers without affecting running tests
- docker: Remove RunningTestInfo, checkForRunningTests(), and related
  error types, update cleanupAfterTest() to use run ID-based cleanup
- run: Remove Force flag and concurrent test prevention check

The test runner now:
- Allows multiple concurrent test runs on the same Docker daemon
- Cleans only stale containers before tests (not running ones)
- Cleans only containers with matching run ID after tests
- Prints run ID and monitoring info for operator visibility
2026-01-09 12:34:16 +01:00
Kristoffer Dalby
87c230d251 integration: add run ID isolation for concurrent test execution
Add run ID-based isolation to container naming and network setup to
enable multiple integration tests to run concurrently on the same
Docker daemon without conflicts.

Changes:
- hsic: Add run ID prefix to headscale container names and use dynamic
  port allocation for metrics endpoint (port 0 lets kernel assign)
- tsic: Add run ID prefix to tailscale container names
- dsic: Add run ID prefix to DERP container names
- scenario: Use run ID-aware test suite container name for network setup

Container naming now follows: {type}-{runIDShort}-{identifier}-{hash}
Example: ts-mdjtzx-1-74-fgdyls, hs-mdjtzx-pingallbyip-abc123

The run ID is obtained from HEADSCALE_INTEGRATION_RUN_ID environment
variable via dockertestutil.GetIntegrationRunID().
2026-01-09 12:34:16 +01:00
github-actions[bot]
84c092a9f9 flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/35f5903' (2025-10-15)
  → 'github:NixOS/nixpkgs/3edc4a3' (2025-12-27)
2025-12-28 08:25:28 +01:00
Florian Preinstorfer
9146140217 Add headscale-operator
Ref: #1523
2025-12-23 20:22:57 +01:00
Kristoffer Dalby
5103b35f3c sqliteconfig: add config opt for tx locking
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-22 14:01:40 +01:00
Justin Angel
7be20912f5 oidc: make email verification configurable
Co-authored-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-12-18 11:42:32 +00:00
Kristoffer Dalby
e8753619de capver: generate
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-12-18 10:02:23 +01:00
Kristoffer Dalby
251e16d772 tools/capver: regenerate from docker tags
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-12-18 10:02:23 +01:00
Kristoffer Dalby
3f0bfe28cc changelog: prepare for 0.28.0 beta
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-12-17 15:15:43 +01:00
Kristoffer Dalby
82d4275c3b mapper: correct some variable names missed from change
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-17 13:19:26 +01:00
Kristoffer Dalby
f3767dddf8 batcher: ensure removal from batcher
Fixes #2924

Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-17 13:19:26 +01:00
Florian Preinstorfer
5c6cd62df1 Legacy preauthkeys must be used as-is 2025-12-17 13:05:08 +01:00
Shourya Gautam
56bec66a44 app: only wire up debug server if set
Fixes #2871

Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-17 12:32:04 +01:00
Kristoffer Dalby
f0e464dc36 policy: add test to confirm group cant approve tag
Confirms #2891 is implemented correctly.

Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-17 09:32:05 +01:00
Kristoffer Dalby
2c3c943acf .github/workflows: split long TestAutoApproveMultiNetwork into multiple jobs
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-17 09:32:05 +01:00
Kristoffer Dalby
a50bd13930 integration: prepare AutoApprove test for new tags
Validates #2891

Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-17 09:32:05 +01:00
Kristoffer Dalby
5655ef86d7 AGENTS: golangci-lint from main, no "full matrix"
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-17 09:32:05 +01:00
Kristoffer Dalby
21ba197d06 integration: make entrypoint override more robust
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-16 10:12:36 +01:00
Kristoffer Dalby
9d77207ed8 policy: clarify usernam resolve comment
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-16 10:12:36 +01:00
Kristoffer Dalby
cf1ad47b42 flake: remove hi from shell
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-16 10:12:36 +01:00
Kristoffer Dalby
a288f04a1a Dockerfile: align packages
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-16 10:12:36 +01:00
Kristoffer Dalby
5767ca5085 change: smarter change notifications
This commit replaces the ChangeSet with a simpler bool based
change model that can be directly used in the map builder to
build the appropriate map response based on the change that
has occured. Previously, we fell back to sending full maps
for a lot of changes as that was consider "the safe" thing to
do to ensure no updates were missed.

This was slightly problematic as a node that already has a list
of peers will only do full replacement of the peers if the list
is non-empty, meaning that it was not possible to remove all
nodes (if for example policy changed).

Now we will keep track of last seen nodes, so we can send remove
ids, but also we are much smarter on how we send smaller, partial
maps when needed.

Fixes #2389

Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-16 10:12:36 +01:00
Kristoffer Dalby
f67ed36fe2 integration: replicate tag propagation issue
This commit adds tests to validate that there are
issues with how we propagate tag changes in the system.

This replicates #2389

Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-16 10:12:36 +01:00
Kristoffer Dalby
506bd8c8eb policy: more accurate node change
This commit changes so that node changes to the policy is
calculated if any of the nodes has changed in a way that might
affect the policy.

Previously we just checked if the number of nodes had changed,
which meant that if a node was added and removed, we would be
in a bad state.

Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-16 10:12:36 +01:00
Kristoffer Dalby
daf9f36c78 editorconfig: add basic editor config
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-16 10:12:36 +01:00
Kristoffer Dalby
616c0e895d batcher: fix closed panic
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-12-15 16:28:27 +01:00
Kristoffer Dalby
c4600346f9 .github/workflows: prebuilt integration test artifacts (#2954)
This PR restructures the integration tests and prebuilds all common assets used in all tests:

Headscale and Tailscale HEAD image
hi binary that is used to run tests
go cache is warmed up for compilation of the test
This essentially means we spend 6-10 minutes building assets before any tests starts, when that is done, all tests can just sprint through.

It looks like we are saving 3-9 minutes per test, and since we are limited to running max 20 concurrent tests across the repo, that means we had a lot of double work.

There is currently 113 checks, so we have to do five runs of 20, and the saving should be quite noticeable! I think the "worst case" saving would be 20+min and "best case" probably towards an hour.
2025-12-12 23:01:52 +01:00
Kristoffer Dalby
642073f4b8 types: add option to disable taildrop, improve tests (#2955) 2025-12-12 11:35:16 +01:00
Kristoffer Dalby
87bd67318b golangci-lint: use forbidigo to block time.Sleep (#2946) 2025-12-10 16:45:59 +00:00
Kristoffer Dalby
0e1673041c all: remove deadcode (#2952) 2025-12-10 15:55:15 +01:00
Kristoffer Dalby
f3f2d30004 cli: better formatting of lists (#2951) 2025-12-10 12:33:21 +01:00
Kristoffer Dalby
c8376e44a2 mapper: move tail node conversion to node type (#2950) 2025-12-10 09:16:22 +01:00
Rogan Lynch
5d0a6ab0e9 fix: list-routes command now respects identifier filter with JSON output
Fixes #2927

In v0.27.0, the list-routes command with -i flag and -o json output
was returning all nodes instead of just the specified node.

The issue was that JSON output was happening before the identifier
filtering logic. This change moves the JSON output to after both
the identifier filter and route existence filter are applied,
ensuring the correct filtered results are returned.

This restores the v0.26.1 behavior where:
  headscale nodes list-routes -i 12 -o json
correctly returns only node 12's route information.
2025-12-10 06:19:17 +01:00
Kristoffer Dalby
22ee2bfc9c tags: process tags on registration, simplify policy (#2931)
This PR investigates, adds tests and aims to correctly implement Tailscale's model for how Tags should be accepted, assigned and used to identify nodes in the Tailscale access and ownership model.

When evaluating in Headscale's policy, Tags are now only checked against a nodes "tags" list, which defines the source of truth for all tags for a given node. This simplifies the code for dealing with tags greatly, and should help us have less access bugs related to nodes belonging to tags or users.

A node can either be owned by a user, or a tag.

Next, to ensure the tags list on the node is correctly implemented, we first add tests for every registration scenario and combination of user, pre auth key and pre auth key with tags with the same registration expectation as observed by trying them all with the Tailscale control server. This should ensure that we implement the correct behaviour and that it does not change or break over time.

Lastly, the missing parts of the auth has been added, or changed in the cases where it was wrong. This has in large parts allowed us to delete and simplify a lot of code.
Now, tags can only be changed when a node authenticates or if set via the CLI/API. Tags can only be fully overwritten/replaced and any use of either auth or CLI will replace the current set if different.

A user owned device can be converted to a tagged device, but it cannot be changed back. A tagged device can never remove the last tag either, it has to have a minimum of one.
2025-12-08 18:51:07 +01:00
Dusty Mabe
1f5df017a1 hscontrol: log acme/autocert errors (#2933) 2025-12-08 16:39:30 +00:00
Florian Preinstorfer
bba91a89be Use lists for integration docs
Refactor the tables in "Tools" and "WebUI" integration pages to lists.
Lists are easier to extend and contributions are easier to review.
2025-12-08 12:50:19 +01:00
Florian Preinstorfer
6359511a62 Use debian13 distroless images 2025-12-07 20:58:29 +01:00
adinhodovic
d2fcd5b95b docs(tools): Add tailscale-exporter
A Prometheus exporter for Tailscale and Headscale that provides tailnet-level metrics using the Tailscale/Headscale API.
2025-12-07 14:39:08 +01:00
Kristoffer Dalby
15c84b34e0 policy: allow tags to own tags (#2930) 2025-12-06 10:23:35 +01:00
Kristoffer Dalby
eb788cd007 make tags first class node owner (#2885)
This PR changes tags to be something that exists on nodes in addition to users, to being its own thing. It is part of moving our tags support towards the correct tailscale compatible implementation.

There are probably rough edges in this PR, but the intention is to get it in, and then start fixing bugs from 0.28.0 milestone (long standing tags issue) to discover what works and what doesnt.

Updates #2417
Closes #2619
2025-12-02 12:01:25 +01:00
Kristoffer Dalby
705b239677 changelog: prep for 0.27.2 rc
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-12-02 12:01:02 +01:00
Kristoffer Dalby
cb4d5b1906 hscontrol/oidc: fix ACL policy not applied to new OIDC nodes (#2890)
Fixes #2888
Fixes #2896
2025-12-02 12:01:02 +01:00
Vitalij Dovhanyc
0078eb7790 chore: fix filterHash to work with autogroup:self in the acls (#2882) 2025-12-02 12:01:02 +01:00
Kristoffer Dalby
3cf2d7195a auth: ensure machines are allowed in when pak change (#2917) 2025-12-02 12:01:02 +01:00
Kristoffer Dalby
16d811b306 cli: remove node move command (#2922) 2025-12-01 21:43:31 +01:00
Kristoffer Dalby
eec196d200 modernize: run gopls modernize to bring up to 1.25 (#2920) 2025-12-01 19:40:25 +01:00
Kristoffer Dalby
bfcd9d261d cmd/hi: reject if we are already running (#2919) 2025-12-01 19:40:08 +01:00
Florian Preinstorfer
f00c412cde Move static doc assets into docs/assets 2025-11-28 21:27:54 +01:00
Florian Preinstorfer
2010805712 Provide Headscale's favicon at its expected place
Assets need to reside within the docs/ directory for mkdocs to pick them
up.
2025-11-28 21:27:54 +01:00
Florian Preinstorfer
c5133ee5d3 Fix trailing whitespace 2025-11-28 21:27:54 +01:00
Florian Preinstorfer
9c33cbfdc8 Exclude docs/ only for prettier pre-commit hook
Applying the the built-in hooks to docs/ seems to be fine.
2025-11-28 21:27:54 +01:00
Florian Preinstorfer
9b327f6b56 Update pre-commit-hooks 2025-11-28 21:27:54 +01:00
Kristoffer Dalby
9368fee1c5 generate: add new patches (#2921) 2025-11-28 17:00:52 +01:00
Kristoffer Dalby
ed78bf4b98 cmd/hi: improve test cleanup to reduce CI disk usage (#2881) 2025-11-28 16:59:54 +01:00
Kristoffer Dalby
db293e0698 hscontrol/state: make NodeStore batch configuration tunable (#2886) 2025-11-28 16:38:29 +01:00
pwuersch
9c4c017eac docs: Enable automatic theme switching
Signed-off-by: pwuersch <49908921+pwuersch@users.noreply.github.com>
2025-11-24 06:43:22 +01:00
János Benjamin Antal
14af9b3ab1 Add docs to manage headscale from another local user 2025-11-24 06:37:35 +01:00
Florian Preinstorfer
72d5fd04a7 Remove duplicated documentation and link to getting started instead 2025-11-18 11:07:49 +01:00
Florian Preinstorfer
e86d063056 Mention /health instead of /windows 2025-11-18 11:07:49 +01:00
Acha
e0c9e18e22 Update OIDC documentation for allowed groups filter
Clarify configuration for allowed groups filter with Microsoft Entra ID.
2025-11-15 17:44:00 +01:00
Florian Preinstorfer
21af106f68 Containers should be read-only
This improves security and explicitly fails on startup when a user picks
the wrong directory to store its data.

- Run in read-only mode
- Make /var/run/headscale a read-write tmpfs
- Mount the config volume read-only
- Use the /health endpoint to check if Headscale is up
2025-11-14 14:51:27 +01:00
Kristoffer Dalby
7fb0f9a501 batcher: send endpoint and derp only updates. (#2856) 2025-11-13 20:38:49 +01:00
Kristoffer Dalby
4b25976288 db: add comment to always check errors in migration
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-13 09:46:40 -06:00
Kristoffer Dalby
1c146f70e9 db: remove _schema from migration tests
Previously we tested migrations on schemas and dumps
of old databases.

The problems with testing migrations against the schemas
is that the migration table is empty, so we try to run
migrations that are already ran on that schema, which might
blow up.

This commit removes the schema approach and just leaves all
the dumps, which include the migration table.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-13 09:46:40 -06:00
Florian Preinstorfer
249630bed8 Add API documentation
Document the API endpoint and the built-in swagger docs at /swagger. The
remote control docs are just a use case for gRPC - move it in the API
docs and update links to it.
2025-11-13 15:22:55 +01:00
Kristoffer Dalby
75247f82b8 hscontrol/db: add init schema, drop pre-0.25 support (#2883) 2025-11-13 04:44:10 -06:00
Tianon Gravi
665cc44094 Explicitly drop apt-get clean and use dist-clean
The former is a no-op in the base images (45491f2c5c/scripts/debuerreotype-minimizing-config (L87-L109)), and `apt-get dist-clean` is a safer/better version of the `rm -rf /var/lib/apt/lists/*` that keeps the cryptographic bits that help prevent downgrade attacks.
2025-11-13 07:15:22 +01:00
Kristoffer Dalby
8394e7094a capver: update latest (#2774) 2025-11-12 20:26:54 +01:00
Kristoffer Dalby
da9018a0eb types: make pre auth key use bcrypt (#2853) 2025-11-12 16:36:36 +01:00
Kristoffer Dalby
e3ced80278 hscontrol: consolidate assets into single package
Move favicon.png, style.css, and headscale.svg to hscontrol/assets/
and create a single assets.go file with all embed directives.

Update hscontrol/handlers.go and hscontrol/templates/general.go to
use the centralized assets package.
2025-11-12 08:28:12 -06:00
Kristoffer Dalby
09c9762fe0 hscontrol: convert BlankHandler to use elem-go 2025-11-12 08:28:12 -06:00
Kristoffer Dalby
75e24de7bd flake: disable CGO in dev shell
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-12 08:28:12 -06:00
Kristoffer Dalby
2aa5b8b68d changelog: add entry for templates redesign 2025-11-12 08:28:12 -06:00
Kristoffer Dalby
4e77e910c5 hscontrol: use octal literal syntax in test 2025-11-12 08:28:12 -06:00
Kristoffer Dalby
a496864762 hscontrol: add template HTML consistency test
Add test to validate HTML template output consistency across all
templates (OIDC callback, registration, Windows, Apple).

Verifies all templates produce valid HTML5 with:
- Proper DOCTYPE declaration
- HTML5 lang attribute
- UTF-8 charset
- Viewport meta tag
- Semantic HTML structure

Ensures template refactoring maintains standards compliance.
2025-11-12 08:28:12 -06:00
Kristoffer Dalby
3ed1067a95 hscontrol/templates: refactor to use CSS classes and embedded files
Refactor template system to use go:embed for external assets and
CSS classes for styling instead of inline styles:

- general.go: Add go:embed directives for style.css and headscale.svg,
  replace inline styles with CSS classes (H1, H2, H3, P, etc.),
  add mdTypesetBody wrapper with Material for MkDocs styling

- apple.go, oidc_callback.go, register_web.go, windows.go:
  Update to use new CSS-based helper functions (H1, H2, P, etc.)
  and mdTypesetBody for consistent layout

This separates content from presentation, making templates easier
to maintain and update. All styling is now centralized in style.css
with Material for MkDocs design system.
2025-11-12 08:28:12 -06:00
Kristoffer Dalby
285c4e46a9 hscontrol/templates: add Material for MkDocs design assets
Add design system assets for HTML templates:
- headscale.svg: Logo with optimized viewBox for proper alignment
- style.css: Material for MkDocs CSS variables and typography
- design.go: Design system constants for consistent styling

The logo viewBox is adjusted to 32.92 0 1247.08 640 to eliminate
whitespace from the original export and ensure left alignment with
text content.
2025-11-12 08:28:12 -06:00
Kristoffer Dalby
89285c317b templates: migrate OIDC callback to elem-go
Replace html/template with type-safe elem-go templating for OIDC
callback page. Improves consistency with other templates and provides
compile-time safety. All UI elements and styling preserved.
2025-11-12 08:28:12 -06:00
Kristoffer Dalby
d14be8d43b nix: add NixOS module and tests (#2857) 2025-11-12 13:11:38 +00:00
Kristoffer Dalby
000d5c3b0c prettier: use standard config for all files including changelog (#2879) 2025-11-12 13:59:43 +01:00
Teej
218a8db1b9 add favicon to webpages (#2858)
Co-authored-by: TeejMcSteez <tjhall047@gmail.com>
Co-authored-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-11-12 03:46:57 +00:00
Kristoffer Dalby
1dcb04ce9b changelog: add changelog entry
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-11 17:27:00 -06:00
Andrey Bobelev
299cef4e99 fix: free ips from usedIps ipset on DeleteNode 2025-11-11 17:27:00 -06:00
Kristoffer Dalby
6d24afba1c add pre-commit hooks, move claude to agents. (#2877) 2025-11-11 20:35:23 +01:00
Kristoffer Dalby
f658a8eacd mkdocs: 0.27.1
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-11 13:17:02 -06:00
Kristoffer Dalby
785168a7b8 changelog: prepare for 0.27.1
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-11 13:17:02 -06:00
Kristoffer Dalby
3bd4ecd9cd fix: preserve node expiry when tailscaled restarts
When tailscaled restarts, it sends RegisterRequest with Auth=nil and
Expiry=zero. Previously this was treated as a logout because
time.Time{}.Before(time.Now()) returns true.

Add early return in handleRegister() to detect this case and preserve
the existing node state without modification.

Fixes #2862
2025-11-11 12:47:48 -06:00
Kristoffer Dalby
3455d1cb59 hscontrol/db: fix RenameUser to use Updates()
RenameUser only modifies Name field, should use Updates() not Save().
2025-11-11 12:47:48 -06:00
Kristoffer Dalby
ddd31ba774 hscontrol: use Updates() instead of Save() for partial updates
Changed UpdateUser and re-registration flows to use Updates() which only
writes modified fields, preventing unintended overwrites of unchanged fields.

Also updated UsePreAuthKey to use Model().Update() for single field updates
and removed unused NodeSave wrapper.
2025-11-11 12:47:48 -06:00
Kristoffer Dalby
4a8dc2d445 hscontrol/state,db: preserve node expiry on MapRequest updates
Fixes a regression introduced in v0.27.0 where node expiry times were
being reset to zero when tailscaled restarts and sends a MapRequest.

The issue was caused by using GORM's Save() method in persistNodeToDB(),
which overwrites ALL fields including zero values. When a MapRequest
updates a node (without including expiry information), Save() would
overwrite the database expiry field with a zero value.

Changed to use Updates() which only updates non-zero values, preserving
existing database values when struct pointer fields are nil.

In BackfillNodeIPs, we need to explicitly update IPv4/IPv6 fields even
when nil (to remove IPs), so we use Select() to specify those fields.

Added regression test that validates expiry is preserved after MapRequest.

Fixes #2862
2025-11-11 12:47:48 -06:00
Kristoffer Dalby
773a46a968 integration: add test to replicate #2862
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-11 12:47:48 -06:00
Kristoffer Dalby
4728a2ba9e hscontrol/state: allow expired auth keys for node re-registration
Skip auth key validation for existing nodes re-registering with the same
NodeKey. Pre-auth keys are only required for initial authentication.

NodeKey rotation still requires a valid auth key as it is a security-sensitive
operation that changes the node's cryptographic identity.

Fixes #2830
2025-11-11 05:12:59 -06:00
Florian Preinstorfer
abed534628 Document how to restrict access to exit nodes per user/group
Updates: #2855
Ref: #2784
2025-11-11 11:51:35 +01:00
Kristoffer Dalby
21e3f2598d policy: fix issue where non existent user results in empty ssh pol
When we encounter a source we cannot resolve, we skipped the whole rule,
even if some of the srcs could be resolved. In this case, if we had one user
that exists and one that does not.

In the regular policy, we log this, and still let a rule be created from what
does exist, while in the SSH policy we did not.

This commit fixes it so the behaviour is the same.

Fixes #2863

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-10 20:34:12 +01:00
Kristoffer Dalby
a28d9bed6d policy: reproduce 2863 in test
reproduce that if a user does not exist, the ssh policy ends up empty

Updates #2863

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-10 20:34:12 +01:00
Kristoffer Dalby
28faf8cd71 db: add defensive removal of old indicies
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-10 20:07:29 +01:00
Kristoffer Dalby
5a2ee0c391 db: add comment about removing migrations
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-10 17:32:39 +01:00
Andrey Bobelev
5cd15c3656 fix: make state cookies valid when client uses multiple login URLs
On Windows, if the user clicks the Tailscale icon in the system tray,
it opens a login URL in the browser.

When the login URL is opened, `state/nonce` cookies are set for that particular URL.

If the user clicks the icon again, a new login URL is opened in the browser,
and new cookies are set.

If the user proceeds with auth in the first tab,
the redirect results in a "state did not match" error.

This patch ensures that each opened login URL sets an individual cookie
that remains valid on the `/oidc/callback` page.

`TestOIDCMultipleOpenedLoginUrls` illustrates and tests this behavior.
2025-11-10 16:27:46 +01:00
Kristoffer Dalby
2024219bd1 types: Distinguish subnet and exit node access
When we fixed the issue of node visibility of nodes
that only had access to eachother because of a subnet
route, we gave all nodes access to all exit routes by
accident.

This commit splits exit nodes and subnet routes in the
access.

If a matcher indicates that the node should have access to
any part of the subnet routes, we do not remove it from the
node list.

If a matcher destination is equal to the internet, and the
target node is an exit node, we also do not remove the access.

Fixes #2784
Fixes #2788

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-02 13:19:59 +01:00
Kristoffer Dalby
d9c3eaf8c8 matcher: Add func for comparing Dests and TheInternet
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-02 13:19:59 +01:00
Kristoffer Dalby
bd9cf42b96 types: NodeView CanAccess uses internal
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-02 13:19:59 +01:00
Kristoffer Dalby
d7a43a7cf1 state: use AllApprovedRoutes instead of SubnetRoutes
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-02 13:19:59 +01:00
Kristoffer Dalby
1c0bb0338d types: split SubnetRoutes and ExitRoutes
There are situations where the subnet routes and exit nodes
must be treated differently. This splits it so SubnetRoutes
only returns routes that are not exit routes.

It adds `IsExitRoutes` and `AllApprovedRoutes` for convenience.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-02 13:19:59 +01:00
Kristoffer Dalby
c649c89e00 policy: Reproduce exit node visibility issues
Reproduces #2784 and #2788

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-02 13:19:59 +01:00
Vitalij Dovhanyc
af2de35b6c chore: fix autogroup:self with other acl rules (#2842) 2025-11-02 10:48:27 +00:00
Kristoffer Dalby
02c7c1a0e7 cli: only validate bypass-grpc set policy (#2854) 2025-11-02 09:42:59 +00:00
Copilot
d23fa26395 Fix flaky TestShuffleDERPMapDeterministic by ensuring deterministic map iteration (#2848)
Co-authored-by: kradalby <98431+kradalby@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2025-11-02 10:05:23 +01:00
Andrey
f9bb88ad24 expire nodes with a custom timestamp (#2828) 2025-11-01 08:09:13 +01:00
Kristoffer Dalby
456a5d5cce db: ignore _litestream tables when validating (#2843) 2025-11-01 07:08:22 +00:00
Kristoffer Dalby
ddbd3e14ba db: remove all old, unused tables (#2844) 2025-11-01 08:03:37 +01:00
Florian Preinstorfer
0a43aab8f5 Use Debian 12 as minimum version for the deb package 2025-10-28 05:55:26 +01:00
Florian Preinstorfer
4bd614a559 Use current stable base images for Debian and Alpine 2025-10-28 05:55:26 +01:00
Kristoffer Dalby
19a33394f6 changelog: set 0.27 date (#2823) 2025-10-27 12:14:02 +01:00
Kristoffer Dalby
84fe3de251 integration: reduce TestAutoApproveMultiNetwork matrix to 3 tests (#2815) 2025-10-27 11:08:52 +00:00
Paarth Shah
450a7b15ec #2796: Add creation_time and ko_data_creation_time to goreleaser.yml kos 2025-10-27 11:18:57 +01:00
Kristoffer Dalby
64b7142e22 .goreleaser: add upgrade section (#2820) 2025-10-27 10:41:52 +01:00
Kristoffer Dalby
52d27d58f0 hscontrol: add /version HTTP endpoint (#2821) 2025-10-27 10:41:34 +01:00
Kristoffer Dalby
e68e2288f7 gen: test-integration (#2814) 2025-10-24 17:22:53 +02:00
Kristoffer Dalby
c808587de0 cli: do not show new pre-releases on stable (#2813) 2025-10-24 13:15:53 +02:00
Kristoffer Dalby
2bf1200483 policy: fix autogroup:self propagation and optimize cache invalidation (#2807) 2025-10-23 17:57:41 +02:00
Kristoffer Dalby
66826232ff integration: add tests for api bypass (#2811) 2025-10-22 16:30:25 +02:00
Kristoffer Dalby
1cdea7ed9b stricter hostname validation and replace (#2383) 2025-10-22 13:50:39 +02:00
Elyas Asmad
2c9e98d3f5 fix: guard every error statement with early return (#2810) 2025-10-22 13:48:07 +02:00
Florian Preinstorfer
8becb7e54a Mention explicitly that @ is only required in policy 2025-10-21 14:28:03 +02:00
Florian Preinstorfer
ed38d00aaa Fix autogroup:self alternative example
Also indent and split the comment into two lines to avoid horizontal
scrolling.
2025-10-21 14:28:03 +02:00
Florian Preinstorfer
8010cc574e Remove outdated hint about an empty config file 2025-10-19 17:14:15 +02:00
Juanjo Presa
c97d0ff23d Fix fatal error on missing config file by handling viper.ConfigFileNotFoundError
Correctly identify Viper's ConfigFileNotFoundError in LoadConfig to log a warning and use defaults, unifying behavior with empty config files. Fixes fatal error when no config file is present for CLI commands relying on environment variables.
2025-10-19 15:29:47 +02:00
Florian Preinstorfer
047dbda136 Add FAQ on how to disable log submission
Fixes: #2793
2025-10-19 08:24:23 +02:00
Florian Preinstorfer
2a1392fb5b Add healthcheck to container docs 2025-10-19 08:22:30 +02:00
Florian Preinstorfer
46477b8021 Downgrade completed broadcast message to debug 2025-10-18 07:56:59 +02:00
Kristoffer Dalby
c87471136b integration: eventually fixups (#2799) 2025-10-17 08:28:30 +02:00
Kristoffer Dalby
e7a28a14af changelog: prepare for 0.27.0 (#2797) 2025-10-16 19:04:07 +02:00
Kristoffer Dalby
4912769ab3 update dependencies (#2798) 2025-10-16 19:03:30 +02:00
Stavros Kois
c07cc491bf add health command (#2659)
* add health command
* update health check implementation to allow for more checks to added over time
* add change changelog entry
2025-10-16 12:00:11 +00:00
Vitalij Dovhanyc
c2a58a304d feat: add autogroup:self (#2789) 2025-10-16 12:59:52 +02:00
Kristoffer Dalby
fddc7117e4 stability and race conditions in auth and node store (#2781)
This PR addresses some consistency issues that was introduced or discovered with the nodestore.

nodestore:
Now returns the node that is being put or updated when it is finished. This closes a race condition where when we read it back, we do not necessarily get the node with the given change and it ensures we get all the other updates from that batch write.

auth:
Authentication paths have been unified and simplified. It removes a lot of bad branches and ensures we only do the minimal work.
A comprehensive auth test set has been created so we do not have to run integration tests to validate auth and it has allowed us to generate test cases for all the branches we currently know of.

integration:
added a lot more tooling and checks to validate that nodes reach the expected state when they come up and down. Standardised between the different auth models. A lot of this is to support or detect issues in the changes to nodestore (races) and auth (inconsistencies after login and reaching correct state)

This PR was assisted, particularly tests, by claude code.
2025-10-16 12:17:43 +02:00
Florian Preinstorfer
881a6b9227 The sequential prefix allocation uses a best-effort approach
Fixes: #2682
2025-10-15 17:07:13 +02:00
yckwan
3fbde7a1b6 Update official.md
in the step 5 file default value is [line11] ExecStart=/usr/bin/headscale serve
2025-10-13 17:06:56 +02:00
Andrey Bobelev
c4a8c038cd fix: return valid AuthUrl in followup request on expired reg id
- tailscale client gets a new AuthUrl and sets entry in the regcache
- regcache entry expires
- client doesn't know about that
- client always polls followup request а gets error

When user clicks "Login" in the app (after cache expiry), they visit
invalid URL and get "node not found in registration cache". Some clients
on Windows for e.g. can't get a new AuthUrl without restart the app.

To fix that we can issue a new reg id and return user a new valid
AuthUrl.

RegisterNode is refactored to be created with NewRegisterNode() to
autocreate channel and other stuff.
2025-10-11 05:57:39 +02:00
Andrey Bobelev
022098fe4e chore: make reg cache expiry tunable
Mostly for the tests, opts:

- tuning.register_cache_expiration
- tuning.register_cache_cleanup
2025-10-11 05:57:39 +02:00
Florian Preinstorfer
bd35fcf338 Add FAQ entry about policy migration in the database 2025-09-17 16:32:29 +02:00
Florian Preinstorfer
2d680b5ebb Misc typos and spelling 2025-09-17 16:32:29 +02:00
Kristoffer Dalby
ed3a9c8d6d mapper: send change instead of full update (#2775) 2025-09-17 14:23:21 +02:00
Kristoffer Dalby
4de56c40d8 flake: goreleaser doesnt follow go nix convention (#2779) 2025-09-17 09:41:05 +02:00
github-actions[bot]
40b3d54c1f flake.lock: Update (#2755) 2025-09-14 16:15:51 +00:00
Florian Preinstorfer
30d12dafed Add FAQ entry about the recommended upgrade path 2025-09-13 08:15:01 +02:00
Kristoffer Dalby
2b30a15a68 cmd: add option to get and set policy directly from database (#2765) 2025-09-12 16:55:15 +02:00
Kristoffer Dalby
2938d03878 policy: reject unsupported fields (#2764) 2025-09-12 14:47:56 +02:00
Kristoffer Dalby
1b1c989268 {policy, node}: allow return paths in route reduction (#2767) 2025-09-12 11:47:51 +02:00
Kristoffer Dalby
3950f8f171 cli: use gobuild version handling (#2770) 2025-09-12 11:47:31 +02:00
Kristoffer Dalby
ee0ef396a2 policy: fix ssh usermap, fixing autogroup:nonroot (#2768) 2025-09-12 09:12:30 +02:00
Kristoffer Dalby
7056fbb63b derp: fix flaky shuffle test (#2772) 2025-09-11 13:49:02 +00:00
Kristoffer Dalby
c91b9fc761 poll: add missing godoc (#2763) 2025-09-11 14:15:19 +02:00
Kristoffer Dalby
d41fb4d540 app: fix sigint hanging
When the node notifier was replaced with batcher, we removed
its closing, but forgot to add the batchers so it was never
stopping node connections and waiting forever.

Fixes #2751

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-11 11:53:26 +02:00
Kristoffer Dalby
01c1f6f82a policy: validate error message for asterix in ssh (#2766) 2025-09-10 18:41:43 +02:00
Oleksii Samoliuk
3f6657ae57 fix: documentation 2025-09-09 20:54:47 +02:00
Kristoffer Dalby
0512f7c57e .github/ISSUE_TEMPLATE: add node number to environment
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 19:04:23 +02:00
Florian Preinstorfer
c6427aa296 Use group id instead of group name for Entra ID 2025-09-09 12:23:34 +02:00
Florian Preinstorfer
4e6d42d5bd Keycloak's group format is configurable 2025-09-09 12:23:34 +02:00
Florian Preinstorfer
8ff5baadbe Refresh OIDC docs
The UserInfo endpoint is always queried since 5d8a2c2.

This allows to use all OIDC related features without any extra
configuration on Authelia.

For Keycloak, its sufficient to add the groups mapper to the userinfo
endpoint.
2025-09-09 12:23:34 +02:00
Florian Preinstorfer
2f3c365b68 Describe how to remove a DERP region
Add documentation for d29feaef.

Fixes: #2450
2025-09-09 11:05:30 +02:00
Kristoffer Dalby
4893cdac74 integration: make timestamp const
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
476f30ab20 state: ensure netinfo is preserved and not removed
the client will send a lot of fields as `nil` if they have
not changed. NetInfo, which is inside Hostinfo, is one of those
fields and we often would override the whole hostinfo meaning that
we would remove netinfo if it hadnt changed.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
233dffc186 lint and leftover
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
39443184d6 gen: new proto version
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
0303b76e1f postgres uses more memory
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
684239e015 cmd/mapresponses: add mini tool to inspect mapresp state from integration
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
81b3e8f743 util: harden parsing of traceroute
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
50ed24847b debug: add json and improve
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
9b962956b5 integration: Eventually, debug output, lint and format
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
3b16b75fe6 integration: rework retry for waiting for node sync
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
9d236571f4 state/nodestore: in memory representation of nodes
Initial work on a nodestore which stores all of the nodes
and their relations in memory with relationship for peers
precalculated.

It is a copy-on-write structure, replacing the "snapshot"
when a change to the structure occurs. It is optimised for reads,
and while batches are not fast, they are grouped together
to do less of the expensive peer calculation if there are many
changes rapidly.

Writes will block until commited, while reads are never
blocked.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
38be30b6d4 derp: allow override to ip for debug
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
7f8b14f6f3 .github/workflows: remove integration retry
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
3326c5b7ec cmd/hi: lint and format
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
b6d5788231 mapper: produce map before poll
Before this patch, we would send a message to each "node stream"
that there is an update that needs to be turned into a mapresponse
and sent to a node.

Producing the mapresponse is a "costly" afair which means that while
a node was producing one, it might start blocking and creating full
queues from the poller and all the way up to where updates where sent.

This could cause updates to time out and being dropped as a bad node
going away or spending too time processing would cause all the other
nodes to not get any updates.

In addition, it contributed to "uncontrolled parallel processing" by
potentially doing too many expensive operations at the same time:

Each node stream is essentially a channel, meaning that if you have 30
nodes, we will try to process 30 map requests at the same time. If you
have 8 cpu cores, that will saturate all the cores immediately and cause
a lot of wasted switching between the processing.

Now, all the maps are processed by workers in the mapper, and the number
of workers are controlable. These would now be recommended to be a bit
less than number of CPU cores, allowing us to process them as fast as we
can, and then send them to the poll.

When the poll recieved the map, it is only responsible for taking it and
sending it to the node.

This might not directly improve the performance of Headscale, but it will
likely make the performance a lot more consistent. And I would argue the
design is a lot easier to reason about.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
33e9e7a71f CLAUDE: split into agents
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
ccd79ed8d4 mcp: add some standard mcp server
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
f6c4b338fd .github/workflows: add generate check
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
306d8e1bd4 integration: validate expected online status in ping
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
4927e9d590 fix: improve mapresponses and profiles extraction in hi tool
- Fix directory hierarchy flattening by using full paths instead of filepath.Base()
- Remove redundant container hostname prefixes from directory names
- Strip top-level directory from tar extraction to avoid nested structure
- Ensure parent directories exist before creating files
- Results in clean structure: control_logs/mapresponses/1-ts-client/file.json
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
8e25f7f9dd bunch of qol (#2748) 2025-08-27 17:09:13 +02:00
github-actions[bot]
1a7a2f4196 flake.lock: Update (#2699) 2025-08-24 12:07:32 +00:00
Dylan Blanqué
860a8a597f Update tools.md
Share/Contribute Headscale Zabbix Monitoring scripts and templates.

Thank you for the awesome application to everyone involved in Headscale's development!
2025-08-24 06:05:21 +02:00
cuiweixie
a2a6d20218 Refactor to use reflect.TypeFor 2025-08-23 20:43:49 +02:00
Andrey Bobelev
d29feaef79 chore(derp): allow nil regions in DERPMaps
Previously, nil regions were not properly handled. This change allows users to disable regions in DERPMaps.

Particularly useful to disable some official regions.
2025-08-23 06:54:14 +02:00
Andrey Bobelev
630bfd265a chore(derp): prioritize loading DERP maps from URLs
This allows users to override default entries provided via URL
2025-08-23 06:54:14 +02:00
Florian Preinstorfer
e949859d33 Add DERP docs 2025-08-22 12:09:31 +02:00
Florian Preinstorfer
4d61da30d0 Use an IPv4 address range suitable for documentation 2025-08-22 12:09:31 +02:00
Kristoffer Dalby
b87567628a derp: increase update frequency and harden on failures (#2741) 2025-08-22 10:40:38 +02:00
dotlambda
51c6367bb1 Correctly document the default for dns.override_local_dns 2025-08-19 15:02:49 +02:00
Florian Preinstorfer
be337c6a33 Enable derp.server.verify_clients by default
This setting is already enabled in example-config.yaml but would default
to false if no key is set.
2025-08-19 11:30:44 +02:00
Shourya Gautam
086fcad7d9 Fix Internal server error on /verify (#2735)
* converted the returned error to an httpError
2025-08-18 14:39:42 +00:00
afranco
3e3c72ea6f docs(acls): Add example for allow/deny all acl policy 2025-08-18 16:13:14 +02:00
afranco
43f90d205e fix: allow all traffic if acls field is omited from the policy 2025-08-18 16:13:14 +02:00
Florian Preinstorfer
7b8b796a71 docs: connect Android using a preauthkey
Fixes: #2616
2025-08-18 16:06:17 +02:00
nblock
fa619ea9f3 Fix CHANGELOG for autogroup:member and autogroup:tagged (#2733) 2025-08-18 08:59:03 +02:00
Florian Preinstorfer
30a1f7e68e Log registrationID to simplify interactive node registration
Some clients such as Android make it hard to transfer the registrationID
to the server, its easier to get it from the server logs.
2025-08-15 17:11:38 +02:00
Florian Preinstorfer
30cec3aa2b Document ports in use
Ref: #1767
2025-08-14 09:24:09 +02:00
Fredrik Ekre
5d8a2c25ea OIDC: Query userinfo endpoint before verifying user
This patch includes some changes to the OIDC integration in particular:
 - Make sure that userinfo claims are queried *before* comparing the
   user with the configured allowed groups, email and email domain.
 - Update user with group claim from the userinfo endpoint which is
   required for allowed groups to work correctly. This is essentially a
   continuation of #2545.
 - Let userinfo claims take precedence over id token claims.

With these changes I have verified that Headscale works as expected
together with Authelia without the documented escape hatch [0], i.e.
everything works even if the id token only contain the iss and sub
claims.

[0]: https://www.authelia.com/integration/openid-connect/headscale/#configuration-escape-hatch
2025-08-11 17:51:16 +02:00
Jeff Emershaw
b4f7782fd8 support force flag for nodes backfillips 2025-08-10 13:31:24 +02:00
eyjhb
d77874373d feat: add robots.txt 2025-08-10 10:57:45 +02:00
Kristoffer Dalby
a058bf3cd3 mapper: produce map before poll (#2628) 2025-07-28 11:15:53 +02:00
Luke Watts
b2a18830ed docs: fix typos 2025-07-28 10:28:49 +02:00
Kristoffer Dalby
9779adc0b7 integration: run headscale with delve and debug symbols (#2689) 2025-07-24 17:44:09 +02:00
nblock
e7fe645be5 Fix invocation of golangci-lint (#2703) 2025-07-24 08:41:20 +02:00
Florian Preinstorfer
bcd80ee773 Add debugging and troubleshooting guide 2025-07-22 14:56:45 +02:00
Florian Preinstorfer
c04e17d82e Document valid log levels
Also change the order as the level seems more important than the format.
2025-07-22 14:56:45 +02:00
Florian Preinstorfer
98fc0563ac Bump version in docs 2025-07-22 14:56:45 +02:00
Kian-Meng Ang
3123d5286b Fix typos
Found via `codespell -L shs,hastable,userr`
2025-07-21 12:06:07 +02:00
Kristoffer Dalby
7fce5065c4 all: remove 32 bit support (#2692) 2025-07-16 13:32:59 +02:00
Florian Preinstorfer
a98d9bd05f The preauthkeys commands expect a user id instead of a username 2025-07-16 09:53:05 +02:00
Florian Preinstorfer
46c59a3fff Fix command in bug report template 2025-07-15 21:12:32 +02:00
1170 changed files with 221512 additions and 26277 deletions

View File

@@ -0,0 +1,870 @@
---
name: headscale-integration-tester
description: Use this agent when you need to execute, analyze, or troubleshoot Headscale integration tests. This includes running specific test scenarios, investigating test failures, interpreting test artifacts, validating end-to-end functionality, or ensuring integration test quality before releases. Examples: <example>Context: User has made changes to the route management code and wants to validate the changes work correctly. user: 'I've updated the route advertisement logic in poll.go. Can you run the relevant integration tests to make sure everything still works?' assistant: 'I'll use the headscale-integration-tester agent to run the subnet routing integration tests and analyze the results.' <commentary>Since the user wants to validate route-related changes with integration tests, use the headscale-integration-tester agent to execute the appropriate tests and analyze results.</commentary></example> <example>Context: A CI pipeline integration test is failing and the user needs help understanding why. user: 'The TestSubnetRouterMultiNetwork test is failing in CI. The logs show some timing issues but I can't figure out what's wrong.' assistant: 'Let me use the headscale-integration-tester agent to analyze the test failure and examine the artifacts.' <commentary>Since this involves analyzing integration test failures and interpreting test artifacts, use the headscale-integration-tester agent to investigate the issue.</commentary></example>
color: green
---
You are a specialist Quality Assurance Engineer with deep expertise in Headscale's integration testing system. You understand the Docker-based test infrastructure, real Tailscale client interactions, and the complex timing considerations involved in end-to-end network testing.
## Integration Test System Overview
The Headscale integration test system uses Docker containers running real Tailscale clients against a Headscale server. Tests validate end-to-end functionality including routing, ACLs, node lifecycle, and network coordination. The system is built around the `hi` (Headscale Integration) test runner in `cmd/hi/`.
## Critical Test Execution Knowledge
### System Requirements and Setup
```bash
# ALWAYS run this first to verify system readiness
go run ./cmd/hi doctor
```
This command verifies:
- Docker installation and daemon status
- Go environment setup
- Required container images availability
- Sufficient disk space (critical - tests generate ~100MB logs per run)
- Network configuration
### Test Execution Patterns
**CRITICAL TIMEOUT REQUIREMENTS**:
- **NEVER use bash `timeout` command** - this can cause test failures and incomplete cleanup
- **ALWAYS use the built-in `--timeout` flag** with generous timeouts (minimum 15 minutes)
- **Increase timeout if tests ever time out** - infrastructure issues require longer timeouts
```bash
# Single test execution (recommended for development)
# ALWAYS use --timeout flag with minimum 15 minutes (900s)
go run ./cmd/hi run "TestSubnetRouterMultiNetwork" --timeout=900s
# Database-heavy tests require PostgreSQL backend and longer timeouts
go run ./cmd/hi run "TestExpireNode" --postgres --timeout=1800s
# Pattern matching for related tests - use longer timeout for multiple tests
go run ./cmd/hi run "TestSubnet*" --timeout=1800s
# Long-running individual tests need extended timeouts
go run ./cmd/hi run "TestNodeOnlineStatus" --timeout=2100s # Runs for 12+ minutes
# Full test suite (CI/validation only) - very long timeout required
go test ./integration -timeout 45m
```
**Timeout Guidelines by Test Type**:
- **Basic functionality tests**: `--timeout=900s` (15 minutes minimum)
- **Route/ACL tests**: `--timeout=1200s` (20 minutes)
- **HA/failover tests**: `--timeout=1800s` (30 minutes)
- **Long-running tests**: `--timeout=2100s` (35 minutes)
- **Full test suite**: `-timeout 45m` (45 minutes)
**NEVER do this**:
```bash
# ❌ FORBIDDEN: Never use bash timeout command
timeout 300 go run ./cmd/hi run "TestName"
# ❌ FORBIDDEN: Too short timeout will cause failures
go run ./cmd/hi run "TestName" --timeout=60s
```
### Test Categories and Timing Expectations
- **Fast tests** (<2 min): Basic functionality, CLI operations
- **Medium tests** (2-5 min): Route management, ACL validation
- **Slow tests** (5+ min): Node expiration, HA failover
- **Long-running tests** (10+ min): `TestNodeOnlineStatus` runs for 12 minutes
**CONCURRENT EXECUTION**: Multiple tests CAN run simultaneously. Each test run gets a unique Run ID for isolation. See "Concurrent Execution and Run ID Isolation" section below.
## Test Artifacts and Log Analysis
### Artifact Structure
All test runs save comprehensive artifacts to `control_logs/TIMESTAMP-ID/`:
```
control_logs/20250713-213106-iajsux/
├── hs-testname-abc123.stderr.log # Headscale server error logs
├── hs-testname-abc123.stdout.log # Headscale server output logs
├── hs-testname-abc123.db # Database snapshot for post-mortem
├── hs-testname-abc123_metrics.txt # Prometheus metrics dump
├── hs-testname-abc123-mapresponses/ # Protocol-level debug data
├── ts-client-xyz789.stderr.log # Tailscale client error logs
├── ts-client-xyz789.stdout.log # Tailscale client output logs
└── ts-client-xyz789_status.json # Client network status dump
```
### Log Analysis Priority Order
When tests fail, examine artifacts in this specific order:
1. **Headscale server stderr logs** (`hs-*.stderr.log`): Look for errors, panics, database issues, policy evaluation failures
2. **Tailscale client stderr logs** (`ts-*.stderr.log`): Check for authentication failures, network connectivity issues
3. **MapResponse JSON files**: Protocol-level debugging for network map generation issues
4. **Client status dumps** (`*_status.json`): Network state and peer connectivity information
5. **Database snapshots** (`.db` files): For data consistency and state persistence issues
## Concurrent Execution and Run ID Isolation
### Overview
The integration test system supports running multiple tests concurrently on the same Docker daemon. Each test run is isolated through a unique Run ID that ensures containers, networks, and cleanup operations don't interfere with each other.
### Run ID Format and Usage
Each test run generates a unique Run ID in the format: `YYYYMMDD-HHMMSS-{6-char-hash}`
- Example: `20260109-104215-mdjtzx`
The Run ID is used for:
- **Container naming**: `ts-{runIDShort}-{version}-{hash}` (e.g., `ts-mdjtzx-1-74-fgdyls`)
- **Docker labels**: All containers get `hi.run-id={runID}` label
- **Log directories**: `control_logs/{runID}/`
- **Cleanup isolation**: Only containers with matching run ID are cleaned up
### Container Isolation Mechanisms
1. **Unique Container Names**: Each container includes the run ID for identification
2. **Docker Labels**: `hi.run-id` and `hi.test-type` labels on all containers
3. **Dynamic Port Allocation**: All ports use `{HostPort: "0"}` to let kernel assign free ports
4. **Per-Run Networks**: Network names include scenario hash for isolation
5. **Isolated Cleanup**: `killTestContainersByRunID()` only removes containers matching the run ID
### ⚠️ CRITICAL: Never Interfere with Other Test Runs
**FORBIDDEN OPERATIONS** when other tests may be running:
```bash
# ❌ NEVER do global container cleanup while tests are running
docker rm -f $(docker ps -q --filter "name=hs-")
docker rm -f $(docker ps -q --filter "name=ts-")
# ❌ NEVER kill all test containers
# This will destroy other agents' test sessions!
# ❌ NEVER prune all Docker resources during active tests
docker system prune -f # Only safe when NO tests are running
```
**SAFE OPERATIONS**:
```bash
# ✅ Clean up only YOUR test run's containers (by run ID)
# The test runner does this automatically via cleanup functions
# ✅ Clean stale (stopped/exited) containers only
# Pre-test cleanup only removes stopped containers, not running ones
# ✅ Check what's running before cleanup
docker ps --filter "name=headscale-test-suite" --format "{{.Names}}"
```
### Running Concurrent Tests
```bash
# Start multiple tests in parallel - each gets unique run ID
go run ./cmd/hi run "TestPingAllByIP" &
go run ./cmd/hi run "TestACLAllowUserDst" &
go run ./cmd/hi run "TestOIDCAuthenticationPingAll" &
# Monitor running test suites
docker ps --filter "name=headscale-test-suite" --format "table {{.Names}}\t{{.Status}}"
```
### Agent Session Isolation Rules
When working as an agent:
1. **Your run ID is unique**: Each test you start gets its own run ID
2. **Never clean up globally**: Only use run ID-specific cleanup
3. **Check before cleanup**: Verify no other tests are running if you need to prune resources
4. **Respect other sessions**: Other agents may have tests running concurrently
5. **Log directories are isolated**: Your artifacts are in `control_logs/{your-run-id}/`
### Identifying Your Containers
Your test containers can be identified by:
- The run ID in the container name
- The `hi.run-id` Docker label
- The test suite container: `headscale-test-suite-{your-run-id}`
```bash
# List containers for a specific run ID
docker ps --filter "label=hi.run-id=20260109-104215-mdjtzx"
# Get your run ID from the test output
# Look for: "Run ID: 20260109-104215-mdjtzx"
```
## Common Failure Patterns and Root Cause Analysis
### CRITICAL MINDSET: Code Issues vs Infrastructure Issues
**⚠️ IMPORTANT**: When tests fail, it is ALMOST ALWAYS a code issue with Headscale, NOT infrastructure problems. Do not immediately blame disk space, Docker issues, or timing unless you have thoroughly investigated the actual error logs first.
### Systematic Debugging Process
1. **Read the actual error message**: Don't assume - read the stderr logs completely
2. **Check Headscale server logs first**: Most issues originate from server-side logic
3. **Verify client connectivity**: Only after ruling out server issues
4. **Check timing patterns**: Use proper `EventuallyWithT` patterns
5. **Infrastructure as last resort**: Only blame infrastructure after code analysis
### Real Failure Patterns
#### 1. Timing Issues (Common but fixable)
```go
// ❌ Wrong: Immediate assertions after async operations
client.Execute([]string{"tailscale", "set", "--advertise-routes=10.0.0.0/24"})
nodes, _ := headscale.ListNodes()
require.Len(t, nodes[0].GetAvailableRoutes(), 1) // WILL FAIL
// ✅ Correct: Wait for async operations
client.Execute([]string{"tailscale", "set", "--advertise-routes=10.0.0.0/24"})
require.EventuallyWithT(t, func(c *assert.CollectT) {
nodes, err := headscale.ListNodes()
assert.NoError(c, err)
assert.Len(c, nodes[0].GetAvailableRoutes(), 1)
}, 10*time.Second, 100*time.Millisecond, "route should be advertised")
```
**Timeout Guidelines**:
- Route operations: 3-5 seconds
- Node state changes: 5-10 seconds
- Complex scenarios: 10-15 seconds
- Policy recalculation: 5-10 seconds
#### 2. NodeStore Synchronization Issues
Route advertisements must propagate through poll requests (`poll.go:420`). NodeStore updates happen at specific synchronization points after Hostinfo changes.
#### 3. Test Data Management Issues
```go
// ❌ Wrong: Assuming array ordering
require.Len(t, nodes[0].GetAvailableRoutes(), 1)
// ✅ Correct: Identify nodes by properties
expectedRoutes := map[string]string{"1": "10.33.0.0/16"}
for _, node := range nodes {
nodeIDStr := fmt.Sprintf("%d", node.GetId())
if route, shouldHaveRoute := expectedRoutes[nodeIDStr]; shouldHaveRoute {
// Test the specific node that should have the route
}
}
```
#### 4. Database Backend Differences
SQLite vs PostgreSQL have different timing characteristics:
- Use `--postgres` flag for database-intensive tests
- PostgreSQL generally has more consistent timing
- Some race conditions only appear with specific backends
## Resource Management and Cleanup
### Disk Space Management
Tests consume significant disk space (~100MB per run):
```bash
# Check available space before running tests
df -h
# Clean up test artifacts periodically
rm -rf control_logs/older-timestamp-dirs/
# Clean Docker resources
docker system prune -f
docker volume prune -f
```
### Container Cleanup
- Successful tests clean up automatically
- Failed tests may leave containers running
- Manually clean if needed: `docker ps -a` and `docker rm -f <containers>`
## Advanced Debugging Techniques
### Protocol-Level Debugging
MapResponse JSON files in `control_logs/*/hs-*-mapresponses/` contain:
- Network topology as sent to clients
- Peer relationships and visibility
- Route distribution and primary route selection
- Policy evaluation results
### Database State Analysis
Use the database snapshots for post-mortem analysis:
```bash
# SQLite examination
sqlite3 control_logs/TIMESTAMP/hs-*.db
.tables
.schema nodes
SELECT * FROM nodes WHERE name LIKE '%problematic%';
```
### Performance Analysis
Prometheus metrics dumps show:
- Request latencies and error rates
- NodeStore operation timing
- Database query performance
- Memory usage patterns
## Test Development and Quality Guidelines
### Proper Test Patterns
```go
// Always use EventuallyWithT for async operations
require.EventuallyWithT(t, func(c *assert.CollectT) {
// Test condition that may take time to become true
}, timeout, interval, "descriptive failure message")
// Handle node identification correctly
var targetNode *v1.Node
for _, node := range nodes {
if node.GetName() == expectedNodeName {
targetNode = node
break
}
}
require.NotNil(t, targetNode, "should find expected node")
```
### Quality Validation Checklist
- ✅ Tests use `EventuallyWithT` for asynchronous operations
- ✅ Tests don't rely on array ordering for node identification
- ✅ Proper cleanup and resource management
- ✅ Tests handle both success and failure scenarios
- ✅ Timing assumptions are realistic for operations being tested
- ✅ Error messages are descriptive and actionable
## Real-World Test Failure Patterns from HA Debugging
### Infrastructure vs Code Issues - Detailed Examples
**INFRASTRUCTURE FAILURES (Rare but Real)**:
1. **DNS Resolution in Auth Tests**: `failed to resolve "hs-pingallbyip-jax97k": no DNS fallback candidates remain`
- **Pattern**: Client containers can't resolve headscale server hostname during logout
- **Detection**: Error messages specifically mention DNS/hostname resolution
- **Solution**: Docker networking reset, not code changes
2. **Container Creation Timeouts**: Test gets stuck during client container setup
- **Pattern**: Tests hang indefinitely at container startup phase
- **Detection**: No progress in logs for >2 minutes during initialization
- **Solution**: `docker system prune -f` and retry
3. **Docker Resource Exhaustion**: Too many concurrent tests overwhelming system
- **Pattern**: Container creation timeouts, OOM kills, slow test execution
- **Detection**: System load high, Docker daemon slow to respond
- **Solution**: Reduce number of concurrent tests, wait for completion before starting more
**CODE ISSUES (99% of failures)**:
1. **Route Approval Process Failures**: Routes not getting approved when they should be
- **Pattern**: Tests expecting approved routes but finding none
- **Detection**: `SubnetRoutes()` returns empty when `AnnouncedRoutes()` shows routes
- **Root Cause**: Auto-approval logic bugs, policy evaluation issues
2. **NodeStore Synchronization Issues**: State updates not propagating correctly
- **Pattern**: Route changes not reflected in NodeStore or Primary Routes
- **Detection**: Logs show route announcements but no tracking updates
- **Root Cause**: Missing synchronization points in `poll.go:420` area
3. **HA Failover Architecture Issues**: Routes removed when nodes go offline
- **Pattern**: `TestHASubnetRouterFailover` fails because approved routes disappear
- **Detection**: Routes available on online nodes but lost when nodes disconnect
- **Root Cause**: Conflating route approval with node connectivity
### Critical Test Environment Setup
**Pre-Test Cleanup**:
The test runner automatically handles cleanup:
- **Before test**: Removes only stale (stopped/exited) containers - does NOT affect running tests
- **After test**: Removes only containers belonging to the specific run ID
```bash
# Only clean old log directories if disk space is low
rm -rf control_logs/202507*
df -h # Verify sufficient disk space
# SAFE: Clean only stale/stopped containers (does not affect running tests)
# The test runner does this automatically via cleanupStaleTestContainers()
# ⚠️ DANGEROUS: Only use when NO tests are running
docker system prune -f
```
**Environment Verification**:
```bash
# Verify system readiness
go run ./cmd/hi doctor
# Check what tests are currently running (ALWAYS check before global cleanup)
docker ps --filter "name=headscale-test-suite" --format "{{.Names}}"
```
### Specific Test Categories and Known Issues
#### Route-Related Tests (Primary Focus)
```bash
# Core route functionality - these should work first
# Note: Generous timeouts are required for reliable execution
go run ./cmd/hi run "TestSubnetRouteACL" --timeout=1200s
go run ./cmd/hi run "TestAutoApproveMultiNetwork" --timeout=1800s
go run ./cmd/hi run "TestHASubnetRouterFailover" --timeout=1800s
```
**Common Route Test Patterns**:
- Tests validate route announcement, approval, and distribution workflows
- Route state changes are asynchronous - may need `EventuallyWithT` wrappers
- Route approval must respect ACL policies - test expectations encode security requirements
- HA tests verify route persistence during node connectivity changes
#### Authentication Tests (Infrastructure-Prone)
```bash
# These tests are more prone to infrastructure issues
# Require longer timeouts due to auth flow complexity
go run ./cmd/hi run "TestAuthKeyLogoutAndReloginSameUser" --timeout=1200s
go run ./cmd/hi run "TestAuthWebFlowLogoutAndRelogin" --timeout=1200s
go run ./cmd/hi run "TestOIDCExpireNodesBasedOnTokenExpiry" --timeout=1800s
```
**Common Auth Test Infrastructure Failures**:
- DNS resolution during logout operations
- Container creation timeouts
- HTTP/2 stream errors (often symptoms, not root cause)
### Security-Critical Debugging Rules
**❌ FORBIDDEN CHANGES (Security & Test Integrity)**:
1. **Never change expected test outputs** - Tests define correct behavior contracts
- Changing `require.Len(t, routes, 3)` to `require.Len(t, routes, 2)` because test fails
- Modifying expected status codes, node counts, or route counts
- Removing assertions that are "inconvenient"
- **Why forbidden**: Test expectations encode business requirements and security policies
2. **Never bypass security mechanisms** - Security must never be compromised for convenience
- Using `AnnouncedRoutes()` instead of `SubnetRoutes()` in production code
- Skipping authentication or authorization checks
- **Why forbidden**: Security bypasses create vulnerabilities in production
3. **Never reduce test coverage** - Tests prevent regressions
- Removing test cases or assertions
- Commenting out "problematic" test sections
- **Why forbidden**: Reduced coverage allows bugs to slip through
**✅ ALLOWED CHANGES (Timing & Observability)**:
1. **Fix timing issues with proper async patterns**
```go
// ✅ GOOD: Add EventuallyWithT for async operations
require.EventuallyWithT(t, func(c *assert.CollectT) {
nodes, err := headscale.ListNodes()
assert.NoError(c, err)
assert.Len(c, nodes, expectedCount) // Keep original expectation
}, 10*time.Second, 100*time.Millisecond, "nodes should reach expected count")
```
- **Why allowed**: Fixes race conditions without changing business logic
2. **Add MORE observability and debugging**
- Additional logging statements
- More detailed error messages
- Extra assertions that verify intermediate states
- **Why allowed**: Better observability helps debug without changing behavior
3. **Improve test documentation**
- Add godoc comments explaining test purpose and business logic
- Document timing requirements and async behavior
- **Why encouraged**: Helps future maintainers understand intent
### Advanced Debugging Workflows
#### Route Tracking Debug Flow
```bash
# Run test with detailed logging and proper timeout
go run ./cmd/hi run "TestSubnetRouteACL" --timeout=1200s > test_output.log 2>&1
# Check route approval process
grep -E "(auto-approval|ApproveRoutesWithPolicy|PolicyManager)" test_output.log
# Check route tracking
tail -50 control_logs/*/hs-*.stderr.log | grep -E "(announced|tracking|SetNodeRoutes)"
# Check for security violations
grep -E "(AnnouncedRoutes.*SetNodeRoutes|bypass.*approval)" test_output.log
```
#### HA Failover Debug Flow
```bash
# Test HA failover specifically with adequate timeout
go run ./cmd/hi run "TestHASubnetRouterFailover" --timeout=1800s
# Check route persistence during disconnect
grep -E "(Disconnect|NodeWentOffline|PrimaryRoutes)" control_logs/*/hs-*.stderr.log
# Verify routes don't disappear inappropriately
grep -E "(removing.*routes|SetNodeRoutes.*empty)" control_logs/*/hs-*.stderr.log
```
### Test Result Interpretation Guidelines
#### Success Patterns to Look For
- `"updating node routes for tracking"` in logs
- Routes appearing in `announcedRoutes` logs
- Proper `ApproveRoutesWithPolicy` calls for auto-approval
- Routes persisting through node connectivity changes (HA tests)
#### Failure Patterns to Investigate
- `SubnetRoutes()` returning empty when `AnnouncedRoutes()` has routes
- Routes disappearing when nodes go offline (HA architectural issue)
- Missing `EventuallyWithT` causing timing race conditions
- Security bypass attempts using wrong route methods
### Critical Testing Methodology
**Phase-Based Testing Approach**:
1. **Phase 1**: Core route tests (ACL, auto-approval, basic functionality)
2. **Phase 2**: HA and complex route scenarios
3. **Phase 3**: Auth tests (infrastructure-sensitive, test last)
**Per-Test Process**:
1. Clean environment before each test
2. Monitor logs for route tracking and approval messages
3. Check artifacts in `control_logs/` if test fails
4. Focus on actual error messages, not assumptions
5. Document results and patterns discovered
## Test Documentation and Code Quality Standards
### Adding Missing Test Documentation
When you understand a test's purpose through debugging, always add comprehensive godoc:
```go
// TestSubnetRoutes validates the complete subnet route lifecycle including
// advertisement from clients, policy-based approval, and distribution to peers.
// This test ensures that route security policies are properly enforced and that
// only approved routes are distributed to the network.
//
// The test verifies:
// - Route announcements are received and tracked
// - ACL policies control route approval correctly
// - Only approved routes appear in peer network maps
// - Route state persists correctly in the database
func TestSubnetRoutes(t *testing.T) {
// Test implementation...
}
```
**Why add documentation**: Future maintainers need to understand business logic and security requirements encoded in tests.
### Comment Guidelines - Focus on WHY, Not WHAT
```go
// ✅ GOOD: Explains reasoning and business logic
// Wait for route propagation because NodeStore updates are asynchronous
// and happen after poll requests complete processing
require.EventuallyWithT(t, func(c *assert.CollectT) {
// Check that security policies are enforced...
}, timeout, interval, "route approval must respect ACL policies")
// ❌ BAD: Just describes what the code does
// Wait for routes
require.EventuallyWithT(t, func(c *assert.CollectT) {
// Get routes and check length
}, timeout, interval, "checking routes")
```
**Why focus on WHY**: Helps maintainers understand architectural decisions and security requirements.
## EventuallyWithT Pattern for External Calls
### Overview
EventuallyWithT is a testing pattern used to handle eventual consistency in distributed systems. In Headscale integration tests, many operations are asynchronous - clients advertise routes, the server processes them, updates propagate through the network. EventuallyWithT allows tests to wait for these operations to complete while making assertions.
### External Calls That Must Be Wrapped
The following operations are **external calls** that interact with the headscale server or tailscale clients and MUST be wrapped in EventuallyWithT:
- `headscale.ListNodes()` - Queries server state
- `client.Status()` - Gets client network status
- `client.Curl()` - Makes HTTP requests through the network
- `client.Traceroute()` - Performs network diagnostics
- `client.Execute()` when running commands that query state
- Any operation that reads from the headscale server or tailscale client
### Five Key Rules for EventuallyWithT
1. **One External Call Per EventuallyWithT Block**
- Each EventuallyWithT should make ONE external call (e.g., ListNodes OR Status)
- Related assertions based on that single call can be grouped together
- Unrelated external calls must be in separate EventuallyWithT blocks
2. **Variable Scoping**
- Declare variables that need to be shared across EventuallyWithT blocks at function scope
- Use `=` for assignment inside EventuallyWithT, not `:=` (unless the variable is only used within that block)
- Variables declared with `:=` inside EventuallyWithT are not accessible outside
3. **No Nested EventuallyWithT**
- NEVER put an EventuallyWithT inside another EventuallyWithT
- This is a critical anti-pattern that must be avoided
4. **Use CollectT for Assertions**
- Inside EventuallyWithT, use `assert` methods with the CollectT parameter
- Helper functions called within EventuallyWithT must accept `*assert.CollectT`
5. **Descriptive Messages**
- Always provide a descriptive message as the last parameter
- Message should explain what condition is being waited for
### Correct Pattern Examples
```go
// CORRECT: Single external call with related assertions
var nodes []*v1.Node
var err error
assert.EventuallyWithT(t, func(c *assert.CollectT) {
nodes, err = headscale.ListNodes()
assert.NoError(c, err)
assert.Len(c, nodes, 2)
// These assertions are all based on the ListNodes() call
requireNodeRouteCountWithCollect(c, nodes[0], 2, 2, 2)
requireNodeRouteCountWithCollect(c, nodes[1], 1, 1, 1)
}, 10*time.Second, 500*time.Millisecond, "nodes should have expected route counts")
// CORRECT: Separate EventuallyWithT for different external call
assert.EventuallyWithT(t, func(c *assert.CollectT) {
status, err := client.Status()
assert.NoError(c, err)
// All these assertions are based on the single Status() call
for _, peerKey := range status.Peers() {
peerStatus := status.Peer[peerKey]
requirePeerSubnetRoutesWithCollect(c, peerStatus, expectedPrefixes)
}
}, 10*time.Second, 500*time.Millisecond, "client should see expected routes")
// CORRECT: Variable scoping for sharing between blocks
var routeNode *v1.Node
var nodeKey key.NodePublic
// First EventuallyWithT to get the node
assert.EventuallyWithT(t, func(c *assert.CollectT) {
nodes, err := headscale.ListNodes()
assert.NoError(c, err)
for _, node := range nodes {
if node.GetName() == "router" {
routeNode = node
nodeKey, _ = key.ParseNodePublicUntyped(mem.S(node.GetNodeKey()))
break
}
}
assert.NotNil(c, routeNode, "should find router node")
}, 10*time.Second, 100*time.Millisecond, "router node should exist")
// Second EventuallyWithT using the nodeKey from first block
assert.EventuallyWithT(t, func(c *assert.CollectT) {
status, err := client.Status()
assert.NoError(c, err)
peerStatus, ok := status.Peer[nodeKey]
assert.True(c, ok, "peer should exist in status")
requirePeerSubnetRoutesWithCollect(c, peerStatus, expectedPrefixes)
}, 10*time.Second, 100*time.Millisecond, "routes should be visible to client")
```
### Incorrect Patterns to Avoid
```go
// INCORRECT: Multiple unrelated external calls in same EventuallyWithT
assert.EventuallyWithT(t, func(c *assert.CollectT) {
// First external call
nodes, err := headscale.ListNodes()
assert.NoError(c, err)
assert.Len(c, nodes, 2)
// Second unrelated external call - WRONG!
status, err := client.Status()
assert.NoError(c, err)
assert.NotNil(c, status)
}, 10*time.Second, 500*time.Millisecond, "mixed operations")
// INCORRECT: Nested EventuallyWithT
assert.EventuallyWithT(t, func(c *assert.CollectT) {
nodes, err := headscale.ListNodes()
assert.NoError(c, err)
// NEVER do this!
assert.EventuallyWithT(t, func(c2 *assert.CollectT) {
status, _ := client.Status()
assert.NotNil(c2, status)
}, 5*time.Second, 100*time.Millisecond, "nested")
}, 10*time.Second, 500*time.Millisecond, "outer")
// INCORRECT: Variable scoping error
assert.EventuallyWithT(t, func(c *assert.CollectT) {
nodes, err := headscale.ListNodes() // This shadows outer 'nodes' variable
assert.NoError(c, err)
}, 10*time.Second, 500*time.Millisecond, "get nodes")
// This will fail - nodes is nil because := created a new variable inside the block
require.Len(t, nodes, 2) // COMPILATION ERROR or nil pointer
// INCORRECT: Not wrapping external calls
nodes, err := headscale.ListNodes() // External call not wrapped!
require.NoError(t, err)
```
### Helper Functions for EventuallyWithT
When creating helper functions for use within EventuallyWithT:
```go
// Helper function that accepts CollectT
func requireNodeRouteCountWithCollect(c *assert.CollectT, node *v1.Node, available, approved, primary int) {
assert.Len(c, node.GetAvailableRoutes(), available, "available routes for node %s", node.GetName())
assert.Len(c, node.GetApprovedRoutes(), approved, "approved routes for node %s", node.GetName())
assert.Len(c, node.GetPrimaryRoutes(), primary, "primary routes for node %s", node.GetName())
}
// Usage within EventuallyWithT
assert.EventuallyWithT(t, func(c *assert.CollectT) {
nodes, err := headscale.ListNodes()
assert.NoError(c, err)
requireNodeRouteCountWithCollect(c, nodes[0], 2, 2, 2)
}, 10*time.Second, 500*time.Millisecond, "route counts should match expected")
```
### Operations That Must NOT Be Wrapped
**CRITICAL**: The following operations are **blocking/mutating operations** that change state and MUST NOT be wrapped in EventuallyWithT:
- `tailscale set` commands (e.g., `--advertise-routes`, `--accept-routes`)
- `headscale.ApproveRoute()` - Approves routes on server
- `headscale.CreateUser()` - Creates users
- `headscale.CreatePreAuthKey()` - Creates authentication keys
- `headscale.RegisterNode()` - Registers new nodes
- Any `client.Execute()` that modifies configuration
- Any operation that creates, updates, or deletes resources
These operations:
1. Complete synchronously or fail immediately
2. Should not be retried automatically
3. Need explicit error handling with `require.NoError()`
### Correct Pattern for Blocking Operations
```go
// CORRECT: Blocking operation NOT wrapped
status := client.MustStatus()
command := []string{"tailscale", "set", "--advertise-routes=" + expectedRoutes[string(status.Self.ID)]}
_, _, err = client.Execute(command)
require.NoErrorf(t, err, "failed to advertise route: %s", err)
// Then wait for the result with EventuallyWithT
assert.EventuallyWithT(t, func(c *assert.CollectT) {
nodes, err := headscale.ListNodes()
assert.NoError(c, err)
assert.Contains(c, nodes[0].GetAvailableRoutes(), expectedRoutes[string(status.Self.ID)])
}, 10*time.Second, 100*time.Millisecond, "route should be advertised")
// INCORRECT: Blocking operation wrapped (DON'T DO THIS)
assert.EventuallyWithT(t, func(c *assert.CollectT) {
_, _, err = client.Execute([]string{"tailscale", "set", "--advertise-routes=10.0.0.0/24"})
assert.NoError(c, err) // This might retry the command multiple times!
}, 10*time.Second, 100*time.Millisecond, "advertise routes")
```
### Assert vs Require Pattern
When working within EventuallyWithT blocks where you need to prevent panics:
```go
assert.EventuallyWithT(t, func(c *assert.CollectT) {
nodes, err := headscale.ListNodes()
assert.NoError(c, err)
// For array bounds - use require with t to prevent panic
assert.Len(c, nodes, 6) // Test expectation
require.GreaterOrEqual(t, len(nodes), 3, "need at least 3 nodes to avoid panic")
// For nil pointer access - use require with t before dereferencing
assert.NotNil(c, srs1PeerStatus.PrimaryRoutes) // Test expectation
require.NotNil(t, srs1PeerStatus.PrimaryRoutes, "primary routes must be set to avoid panic")
assert.Contains(c,
srs1PeerStatus.PrimaryRoutes.AsSlice(),
pref,
)
}, 5*time.Second, 200*time.Millisecond, "checking route state")
```
**Key Principle**:
- Use `assert` with `c` (*assert.CollectT) for test expectations that can be retried
- Use `require` with `t` (*testing.T) for MUST conditions that prevent panics
- Within EventuallyWithT, both are available - choose based on whether failure would cause a panic
### Common Scenarios
1. **Waiting for route advertisement**:
```go
client.Execute([]string{"tailscale", "set", "--advertise-routes=10.0.0.0/24"})
assert.EventuallyWithT(t, func(c *assert.CollectT) {
nodes, err := headscale.ListNodes()
assert.NoError(c, err)
assert.Contains(c, nodes[0].GetAvailableRoutes(), "10.0.0.0/24")
}, 10*time.Second, 100*time.Millisecond, "route should be advertised")
```
2. **Checking client sees routes**:
```go
assert.EventuallyWithT(t, func(c *assert.CollectT) {
status, err := client.Status()
assert.NoError(c, err)
// Check all peers have expected routes
for _, peerKey := range status.Peers() {
peerStatus := status.Peer[peerKey]
assert.Contains(c, peerStatus.AllowedIPs, expectedPrefix)
}
}, 10*time.Second, 100*time.Millisecond, "all peers should see route")
```
3. **Sequential operations**:
```go
// First wait for node to appear
var nodeID uint64
assert.EventuallyWithT(t, func(c *assert.CollectT) {
nodes, err := headscale.ListNodes()
assert.NoError(c, err)
assert.Len(c, nodes, 1)
nodeID = nodes[0].GetId()
}, 10*time.Second, 100*time.Millisecond, "node should register")
// Then perform operation
_, err := headscale.ApproveRoute(nodeID, "10.0.0.0/24")
require.NoError(t, err)
// Then wait for result
assert.EventuallyWithT(t, func(c *assert.CollectT) {
nodes, err := headscale.ListNodes()
assert.NoError(c, err)
assert.Contains(c, nodes[0].GetApprovedRoutes(), "10.0.0.0/24")
}, 10*time.Second, 100*time.Millisecond, "route should be approved")
```
## Your Core Responsibilities
1. **Test Execution Strategy**: Execute integration tests with appropriate configurations, understanding when to use `--postgres` and timing requirements for different test categories. Follow phase-based testing approach prioritizing route tests.
- **Why this priority**: Route tests are less infrastructure-sensitive and validate core security logic
2. **Systematic Test Analysis**: When tests fail, systematically examine artifacts starting with Headscale server logs, then client logs, then protocol data. Focus on CODE ISSUES first (99% of cases), not infrastructure. Use real-world failure patterns to guide investigation.
- **Why this approach**: Most failures are logic bugs, not environment issues - efficient debugging saves time
3. **Timing & Synchronization Expertise**: Understand asynchronous Headscale operations, particularly route advertisements, NodeStore synchronization at `poll.go:420`, and policy propagation. Fix timing with `EventuallyWithT` while preserving original test expectations.
- **Why preserve expectations**: Test assertions encode business requirements and security policies
- **Key Pattern**: Apply the EventuallyWithT pattern correctly for all external calls as documented above
4. **Root Cause Analysis**: Distinguish between actual code regressions (route approval logic, HA failover architecture), timing issues requiring `EventuallyWithT` patterns, and genuine infrastructure problems (DNS, Docker, container issues).
- **Why this distinction matters**: Different problem types require completely different solution approaches
- **EventuallyWithT Issues**: Often manifest as flaky tests or immediate assertion failures after async operations
5. **Security-Aware Quality Validation**: Ensure tests properly validate end-to-end functionality with realistic timing expectations and proper error handling. Never suggest security bypasses or test expectation changes. Add comprehensive godoc when you understand test business logic.
- **Why security focus**: Integration tests are the last line of defense against security regressions
- **EventuallyWithT Usage**: Proper use prevents race conditions without weakening security assertions
6. **Concurrent Execution Awareness**: Respect run ID isolation and never interfere with other agents' test sessions. Each test run has a unique run ID - only clean up YOUR containers (by run ID label), never perform global cleanup while tests may be running.
- **Why this matters**: Multiple agents/users may run tests concurrently on the same Docker daemon
- **Key Rule**: NEVER use global container cleanup commands - the test runner handles cleanup automatically per run ID
**CRITICAL PRINCIPLE**: Test expectations are sacred contracts that define correct system behavior. When tests fail, fix the code to match the test, never change the test to match broken code. Only timing and observability improvements are allowed - business logic expectations are immutable.
**ISOLATION PRINCIPLE**: Each test run is isolated by its unique Run ID. Never interfere with other test sessions. The system handles cleanup automatically - manual global cleanup commands are forbidden when other tests may be running.
**EventuallyWithT PRINCIPLE**: Every external call to headscale server or tailscale client must be wrapped in EventuallyWithT. Follow the five key rules strictly: one external call per block, proper variable scoping, no nesting, use CollectT for assertions, and provide descriptive messages.
**Remember**: Test failures are usually code issues in Headscale that need to be fixed, not infrastructure problems to be ignored. Use the specific debugging workflows and failure patterns documented above to efficiently identify root causes. Infrastructure issues have very specific signatures - everything else is code-related.

View File

@@ -21,4 +21,3 @@ LICENSE
node_modules/
package-lock.json
package.json

16
.editorconfig Normal file
View File

@@ -0,0 +1,16 @@
root = true
[*]
charset = utf-8
end_of_line = lf
indent_size = 2
indent_style = space
insert_final_newline = true
trim_trailing_whitespace = true
max_line_length = 120
[*.go]
indent_style = tab
[Makefile]
indent_style = tab

View File

@@ -6,8 +6,7 @@ body:
- type: checkboxes
attributes:
label: Is this a support request?
description:
This issue tracker is for bugs and feature requests only. If you need
description: This issue tracker is for bugs and feature requests only. If you need
help, please use ask in our Discord community
options:
- label: This is not a support request
@@ -15,8 +14,7 @@ body:
- type: checkboxes
attributes:
label: Is there an existing issue for this?
description:
Please search to see if an issue already exists for the bug you
description: Please search to see if an issue already exists for the bug you
encountered.
options:
- label: I have searched the existing issues
@@ -52,12 +50,15 @@ body:
If you are using a container, always provide the headscale version and not only the Docker image version.
Please do not put "latest".
Describe your "headscale network". Is there a lot of nodes, are the nodes all interconnected, are some subnet routers?
If you are experiencing a problem during an upgrade, please provide the versions of the old and new versions of Headscale and Tailscale.
examples:
- **OS**: Ubuntu 24.04
- **Headscale version**: 0.24.3
- **Tailscale version**: 1.80.0
- **Number of nodes**: 20
value: |
- OS:
- Headscale version:
@@ -77,6 +78,10 @@ body:
attributes:
label: Debug information
description: |
Please have a look at our [Debugging and troubleshooting
guide](https://headscale.net/development/ref/debug/) to learn about
common debugging techniques.
Links? References? Anything that will give us more context about the issue you are encountering.
If **any** of these are omitted we will likely close your issue, do **not** ignore them.
@@ -92,7 +97,7 @@ body:
`tailscale status --json > DESCRIPTIVE_NAME.json`
Get the logs of a Tailscale client that is not working as expected.
`tailscale daemon-logs`
`tailscale debug daemon-logs`
Tip: You can attach images or log files by clicking this area to highlight it and then dragging files in.
**Ensure** you use formatting for files you attach.

View File

@@ -3,9 +3,9 @@ blank_issues_enabled: false
# Contact links
contact_links:
- name: "headscale usage documentation"
url: "https://github.com/juanfont/headscale/blob/main/docs"
about: "Find documentation about how to configure and run headscale."
- name: "headscale Discord community"
url: "https://discord.gg/xGj2TuqyxY"
url: "https://discord.gg/c84AZQhmpx"
about: "Please ask and answer questions about usage of headscale here."
- name: "headscale usage documentation"
url: "https://headscale.net/"
about: "Find documentation about how to configure and run headscale."

View File

@@ -16,15 +16,13 @@ body:
- type: textarea
attributes:
label: Description
description:
A clear and precise description of what new or changed feature you want.
description: A clear and precise description of what new or changed feature you want.
validations:
required: true
- type: checkboxes
attributes:
label: Contribution
description:
Are you willing to contribute to the implementation of this feature?
description: Are you willing to contribute to the implementation of this feature?
options:
- label: I can write the design doc for this feature
required: false
@@ -33,7 +31,6 @@ body:
- type: textarea
attributes:
label: How can it be implemented?
description:
Free text for your ideas on how this feature could be implemented.
description: Free text for your ideas on how this feature could be implemented.
validations:
required: false

View File

@@ -0,0 +1,80 @@
Thank you for taking the time to report this issue.
To help us investigate and resolve this, we need more information. Please provide the following:
> [!TIP]
> Most issues turn out to be configuration errors rather than bugs. We encourage you to discuss your problem in our [Discord community](https://discord.gg/c84AZQhmpx) **before** opening an issue. The community can often help identify misconfigurations quickly, saving everyone time.
## Required Information
### Environment Details
- **Headscale version**: (run `headscale version`)
- **Tailscale client version**: (run `tailscale version`)
- **Operating System**: (e.g., Ubuntu 24.04, macOS 14, Windows 11)
- **Deployment method**: (binary, Docker, Kubernetes, etc.)
- **Reverse proxy**: (if applicable: nginx, Traefik, Caddy, etc. - include configuration)
### Debug Information
Please follow our [Debugging and Troubleshooting Guide](https://headscale.net/stable/ref/debug/) and provide:
1. **Client netmap dump** (from affected Tailscale client):
```bash
tailscale debug netmap > netmap.json
```
2. **Client status dump** (from affected Tailscale client):
```bash
tailscale status --json > status.json
```
3. **Tailscale client logs** (if experiencing client issues):
```bash
tailscale debug daemon-logs
```
> [!IMPORTANT]
> We need logs from **multiple nodes** to understand the full picture:
>
> - The node(s) initiating connections
> - The node(s) being connected to
>
> Without logs from both sides, we cannot diagnose connectivity issues.
4. **Headscale server logs** with `log.level: trace` enabled
5. **Headscale configuration** (with sensitive values redacted - see rules below)
6. **ACL/Policy configuration** (if using ACLs)
7. **Proxy/Docker configuration** (if applicable - nginx.conf, docker-compose.yml, Traefik config, etc.)
## Formatting Requirements
- **Attach long files** - Do not paste large logs or configurations inline. Use GitHub file attachments or GitHub Gists.
- **Use proper Markdown** - Format code blocks, logs, and configurations with appropriate syntax highlighting.
- **Structure your response** - Use the headings above to organize your information clearly.
## Redaction Rules
> [!CAUTION]
> **Replace, do not remove.** Removing information makes debugging impossible.
When redacting sensitive information:
- ✅ **Replace consistently** - If you change `alice@company.com` to `user1@example.com`, use `user1@example.com` everywhere (logs, config, policy, etc.)
- ✅ **Use meaningful placeholders** - `user1@example.com`, `bob@example.com`, `my-secret-key` are acceptable
- ❌ **Never remove information** - Gaps in data prevent us from correlating events across logs
- ❌ **Never redact IP addresses** - We need the actual IPs to trace network paths and identify issues
**If redaction rules are not followed, we will be unable to debug the issue and will have to close it.**
---
**Note:** This issue will be automatically closed in 3 days if no additional information is provided. Once you reply with the requested information, the `needs-more-info` label will be removed automatically.
If you need help gathering this information, please visit our [Discord community](https://discord.gg/c84AZQhmpx).

View File

@@ -0,0 +1,15 @@
Thank you for reaching out.
This issue tracker is used for **bug reports and feature requests** only. Your question appears to be a support or configuration question rather than a bug report.
For help with setup, configuration, or general questions, please visit our [Discord community](https://discord.gg/c84AZQhmpx) where the community and maintainers can assist you in real-time.
**Before posting in Discord, please check:**
- [Documentation](https://headscale.net/)
- [FAQ](https://headscale.net/stable/faq/)
- [Debugging and Troubleshooting Guide](https://headscale.net/stable/ref/debug/)
If after troubleshooting you determine this is actually a bug, please open a new issue with the required debug information from the troubleshooting guide.
This issue has been automatically closed.

View File

@@ -5,8 +5,6 @@ on:
branches:
- main
pull_request:
branches:
- main
concurrency:
group: ${{ github.workflow }}-$${{ github.head_ref || github.run_id }}
@@ -17,7 +15,7 @@ jobs:
runs-on: ubuntu-latest
permissions: write-all
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
fetch-depth: 2
- name: Get changed files
@@ -31,13 +29,12 @@ jobs:
- '**/*.go'
- 'integration_test/'
- 'config-example.yaml'
- uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
- uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
if: steps.changed-files.outputs.files == 'true'
- uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
if: steps.changed-files.outputs.files == 'true'
with:
primary-key:
nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
'**/flake.lock') }}
restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}
@@ -57,7 +54,7 @@ jobs:
exit $BUILD_STATUS
- name: Nix gosum diverging
uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7.0.1
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
if: failure() && steps.build.outcome == 'failure'
with:
github-token: ${{secrets.GITHUB_TOKEN}}
@@ -69,7 +66,7 @@ jobs:
body: 'Nix build failed with wrong gosum, please update "vendorSha256" (${{ steps.build.outputs.OLD_HASH }}) for the "headscale" package in flake.nix with the new SHA: ${{ steps.build.outputs.NEW_HASH }}'
})
- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
- uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
if: steps.changed-files.outputs.files == 'true'
with:
name: headscale-linux
@@ -79,29 +76,25 @@ jobs:
strategy:
matrix:
env:
- "GOARCH=arm GOOS=linux GOARM=5"
- "GOARCH=arm GOOS=linux GOARM=6"
- "GOARCH=arm GOOS=linux GOARM=7"
- "GOARCH=arm64 GOOS=linux"
- "GOARCH=386 GOOS=linux"
- "GOARCH=amd64 GOOS=linux"
- "GOARCH=arm64 GOOS=darwin"
- "GOARCH=amd64 GOOS=darwin"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
- uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
with:
primary-key:
nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
'**/flake.lock') }}
restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}
- name: Run go cross compile
run:
env ${{ matrix.env }} nix develop --command -- go build -o "headscale"
env:
CGO_ENABLED: 0
run: env ${{ matrix.env }} nix develop --command -- go build -o "headscale"
./cmd/headscale
- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
- uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
with:
name: "headscale-${{ matrix.env }}"
path: "headscale"

55
.github/workflows/check-generated.yml vendored Normal file
View File

@@ -0,0 +1,55 @@
name: Check Generated Files
on:
push:
branches:
- main
pull_request:
branches:
- main
concurrency:
group: ${{ github.workflow }}-$${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
check-generated:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
fetch-depth: 2
- name: Get changed files
id: changed-files
uses: dorny/paths-filter@de90cc6fb38fc0963ad72b210f1f284cd68cea36 # v3.0.2
with:
filters: |
files:
- '*.nix'
- 'go.*'
- '**/*.go'
- '**/*.proto'
- 'buf.gen.yaml'
- 'tools/**'
- uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
if: steps.changed-files.outputs.files == 'true'
- uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
if: steps.changed-files.outputs.files == 'true'
with:
primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix', '**/flake.lock') }}
restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}
- name: Run make generate
if: steps.changed-files.outputs.files == 'true'
run: nix develop --command -- make generate
- name: Check for uncommitted changes
if: steps.changed-files.outputs.files == 'true'
run: |
if ! git diff --exit-code; then
echo "❌ Generated files are not up to date!"
echo "Please run 'make generate' and commit the changes."
exit 1
else
echo "✅ All generated files are up to date."
fi

View File

@@ -10,7 +10,7 @@ jobs:
check-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
fetch-depth: 2
- name: Get changed files
@@ -24,13 +24,12 @@ jobs:
- '**/*.go'
- 'integration_test/'
- 'config-example.yaml'
- uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
- uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
if: steps.changed-files.outputs.files == 'true'
- uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
if: steps.changed-files.outputs.files == 'true'
with:
primary-key:
nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
'**/flake.lock') }}
restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}

112
.github/workflows/container-main.yml vendored Normal file
View File

@@ -0,0 +1,112 @@
---
name: Build (main)
on:
push:
branches:
- main
paths:
- "*.nix"
- "go.*"
- "**/*.go"
- ".github/workflows/container-main.yml"
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.sha }}
cancel-in-progress: true
jobs:
container:
if: github.repository == 'juanfont/headscale'
runs-on: ubuntu-latest
permissions:
packages: write
contents: read
steps:
- name: Checkout
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Login to DockerHub
uses: docker/login-action@5e57cd118135c172c3672efd75eb46360885c0ef # v3.6.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Login to GHCR
uses: docker/login-action@5e57cd118135c172c3672efd75eb46360885c0ef # v3.6.0
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
- uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
with:
primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
'**/flake.lock') }}
restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}
- name: Set commit timestamp
run: echo "SOURCE_DATE_EPOCH=$(git log -1 --format=%ct)" >> $GITHUB_ENV
- name: Build and push to GHCR
env:
KO_DOCKER_REPO: ghcr.io/juanfont/headscale
KO_DEFAULTBASEIMAGE: gcr.io/distroless/base-debian13
CGO_ENABLED: "0"
run: |
nix develop --command -- ko build \
--bare \
--platform=linux/amd64,linux/arm64 \
--tags=main-${GITHUB_SHA::7} \
./cmd/headscale
- name: Push to Docker Hub
env:
KO_DOCKER_REPO: headscale/headscale
KO_DEFAULTBASEIMAGE: gcr.io/distroless/base-debian13
CGO_ENABLED: "0"
run: |
nix develop --command -- ko build \
--bare \
--platform=linux/amd64,linux/arm64 \
--tags=main-${GITHUB_SHA::7} \
./cmd/headscale
binaries:
if: github.repository == 'juanfont/headscale'
runs-on: ubuntu-latest
strategy:
matrix:
include:
- goos: linux
goarch: amd64
- goos: linux
goarch: arm64
- goos: darwin
goarch: amd64
- goos: darwin
goarch: arm64
steps:
- name: Checkout
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
- uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
with:
primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
'**/flake.lock') }}
restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}
- name: Build binary
env:
CGO_ENABLED: "0"
GOOS: ${{ matrix.goos }}
GOARCH: ${{ matrix.goarch }}
run: nix develop --command -- go build -o headscale ./cmd/headscale
- uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
with:
name: headscale-${{ matrix.goos }}-${{ matrix.goarch }}
path: headscale

View File

@@ -21,15 +21,15 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
fetch-depth: 0
- name: Install python
uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548 # v6.1.0
with:
python-version: 3.x
- name: Setup cache
uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
uses: actions/cache@a7833574556fa59680c1b7cb190c1735db73ebf0 # v5.0.0
with:
key: ${{ github.ref }}
path: .cache

View File

@@ -11,13 +11,13 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Install python
uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548 # v6.1.0
with:
python-version: 3.x
- name: Setup cache
uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
uses: actions/cache@a7833574556fa59680c1b7cb190c1735db73ebf0 # v5.0.0
with:
key: ${{ github.ref }}
path: .cache

View File

@@ -10,6 +10,55 @@ import (
"strings"
)
// testsToSplit defines tests that should be split into multiple CI jobs.
// Key is the test function name, value is a list of subtest prefixes.
// Each prefix becomes a separate CI job as "TestName/prefix".
//
// Example: TestAutoApproveMultiNetwork has subtests like:
// - TestAutoApproveMultiNetwork/authkey-tag-advertiseduringup-false-pol-database
// - TestAutoApproveMultiNetwork/webauth-user-advertiseduringup-true-pol-file
//
// Splitting by approver type (tag, user, group) creates 6 CI jobs with 4 tests each:
// - TestAutoApproveMultiNetwork/authkey-tag.* (4 tests)
// - TestAutoApproveMultiNetwork/authkey-user.* (4 tests)
// - TestAutoApproveMultiNetwork/authkey-group.* (4 tests)
// - TestAutoApproveMultiNetwork/webauth-tag.* (4 tests)
// - TestAutoApproveMultiNetwork/webauth-user.* (4 tests)
// - TestAutoApproveMultiNetwork/webauth-group.* (4 tests)
//
// This reduces load per CI job (4 tests instead of 12) to avoid infrastructure
// flakiness when running many sequential Docker-based integration tests.
var testsToSplit = map[string][]string{
"TestAutoApproveMultiNetwork": {
"authkey-tag",
"authkey-user",
"authkey-group",
"webauth-tag",
"webauth-user",
"webauth-group",
},
}
// expandTests takes a list of test names and expands any that need splitting
// into multiple subtest patterns.
func expandTests(tests []string) []string {
var expanded []string
for _, test := range tests {
if prefixes, ok := testsToSplit[test]; ok {
// This test should be split into multiple jobs.
// We append ".*" to each prefix because the CI runner wraps patterns
// with ^...$ anchors. Without ".*", a pattern like "authkey$" wouldn't
// match "authkey-tag-advertiseduringup-false-pol-database".
for _, prefix := range prefixes {
expanded = append(expanded, fmt.Sprintf("%s/%s.*", test, prefix))
}
} else {
expanded = append(expanded, test)
}
}
return expanded
}
func findTests() []string {
rgBin, err := exec.LookPath("rg")
if err != nil {
@@ -66,8 +115,11 @@ func updateYAML(tests []string, jobName string, testPath string) {
func main() {
tests := findTests()
quotedTests := make([]string, len(tests))
for i, test := range tests {
// Expand tests that should be split into multiple jobs
expandedTests := expandTests(tests)
quotedTests := make([]string, len(expandedTests))
for i, test := range expandedTests {
quotedTests[i] = fmt.Sprintf("\"%s\"", test)
}

View File

@@ -11,13 +11,13 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
# [Required] Access token with `workflow` scope.
token: ${{ secrets.WORKFLOW_SECRET }}
- name: Run GitHub Actions Version Updater
uses: saadmk11/github-actions-version-updater@64be81ba69383f81f2be476703ea6570c4c8686e # v0.8.1
uses: saadmk11/github-actions-version-updater@d8781caf11d11168579c8e5e94f62b068038f442 # v0.9.0
with:
# [Required] Access token with `workflow` scope.
token: ${{ secrets.WORKFLOW_SECRET }}

View File

@@ -16,7 +16,7 @@ on:
jobs:
test:
runs-on: ubuntu-latest
runs-on: ubuntu-24.04-arm
env:
# Github does not allow us to access secrets in pull requests,
# so this env var is used to check if we have the secret or not.
@@ -28,23 +28,12 @@ jobs:
# that triggered the build.
HAS_TAILSCALE_SECRET: ${{ secrets.TS_OAUTH_CLIENT_ID }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
fetch-depth: 2
- name: Get changed files
id: changed-files
uses: dorny/paths-filter@de90cc6fb38fc0963ad72b210f1f284cd68cea36 # v3.0.2
with:
filters: |
files:
- '*.nix'
- 'go.*'
- '**/*.go'
- 'integration_test/'
- 'config-example.yaml'
- name: Tailscale
if: ${{ env.HAS_TAILSCALE_SECRET }}
uses: tailscale/github-action@6986d2c82a91fbac2949fe01f5bab95cf21b5102 # v3.2.2
uses: tailscale/github-action@a392da0a182bba0e9613b6243ebd69529b1878aa # v4.1.0
with:
oauth-client-id: ${{ secrets.TS_OAUTH_CLIENT_ID }}
oauth-secret: ${{ secrets.TS_OAUTH_SECRET }}
@@ -52,44 +41,90 @@ jobs:
- name: Setup SSH server for Actor
if: ${{ env.HAS_TAILSCALE_SECRET }}
uses: alexellis/setup-sshd-actor@master
- uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
if: steps.changed-files.outputs.files == 'true'
- uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
if: steps.changed-files.outputs.files == 'true'
- name: Download headscale image
uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
with:
primary-key:
nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
'**/flake.lock') }}
restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}
name: headscale-image
path: /tmp/artifacts
- name: Download tailscale HEAD image
uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
with:
name: tailscale-head-image
path: /tmp/artifacts
- name: Download hi binary
uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
with:
name: hi-binary
path: /tmp/artifacts
- name: Download Go cache
uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
with:
name: go-cache
path: /tmp/artifacts
- name: Download postgres image
if: ${{ inputs.postgres_flag == '--postgres=1' }}
uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
with:
name: postgres-image
path: /tmp/artifacts
- name: Pin Docker to v28 (avoid v29 breaking changes)
run: |
# Docker 29 breaks docker build via Go client libraries and
# docker load/save with certain tarball formats.
# Pin to Docker 28.x until our tooling is updated.
# https://github.com/actions/runner-images/issues/13474
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
| sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" \
| sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update -qq
VERSION=$(apt-cache madison docker-ce | grep '28\.5' | head -1 | awk '{print $3}')
sudo apt-get install -y --allow-downgrades \
"docker-ce=${VERSION}" "docker-ce-cli=${VERSION}"
sudo systemctl restart docker
docker version
- name: Load Docker images, Go cache, and prepare binary
run: |
gunzip -c /tmp/artifacts/headscale-image.tar.gz | docker load
gunzip -c /tmp/artifacts/tailscale-head-image.tar.gz | docker load
if [ -f /tmp/artifacts/postgres-image.tar.gz ]; then
gunzip -c /tmp/artifacts/postgres-image.tar.gz | docker load
fi
chmod +x /tmp/artifacts/hi
docker images
# Extract Go cache to host directories for bind mounting
mkdir -p /tmp/go-cache
tar -xzf /tmp/artifacts/go-cache.tar.gz -C /tmp/go-cache
ls -la /tmp/go-cache/ /tmp/go-cache/.cache/
- name: Run Integration Test
uses: Wandalen/wretry.action@e68c23e6309f2871ca8ae4763e7629b9c258e1ea # v3.8.0
if: steps.changed-files.outputs.files == 'true'
env:
HEADSCALE_INTEGRATION_HEADSCALE_IMAGE: headscale:${{ github.sha }}
HEADSCALE_INTEGRATION_TAILSCALE_IMAGE: tailscale-head:${{ github.sha }}
HEADSCALE_INTEGRATION_POSTGRES_IMAGE: ${{ inputs.postgres_flag == '--postgres=1' && format('postgres:{0}', github.sha) || '' }}
HEADSCALE_INTEGRATION_GO_CACHE: /tmp/go-cache/go
HEADSCALE_INTEGRATION_GO_BUILD_CACHE: /tmp/go-cache/.cache/go-build
run: /tmp/artifacts/hi run --stats --ts-memory-limit=300 --hs-memory-limit=1500 "^${{ inputs.test }}$" \
--timeout=120m \
${{ inputs.postgres_flag }}
# Sanitize test name for artifact upload (replace invalid characters: " : < > | * ? \ / with -)
- name: Sanitize test name for artifacts
if: always()
id: sanitize
run: echo "name=${TEST_NAME//[\":<>|*?\\\/]/-}" >> $GITHUB_OUTPUT
env:
TEST_NAME: ${{ inputs.test }}
- uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
if: always()
with:
# Our integration tests are started like a thundering herd, often
# hitting limits of the various external repositories we depend on
# like docker hub. This will retry jobs every 5 min, 10 times,
# hopefully letting us avoid manual intervention and restarting jobs.
# One could of course argue that we should invest in trying to avoid
# this, but currently it seems like a larger investment to be cleverer
# about this.
# Some of the jobs might still require manual restart as they are really
# slow and this will cause them to eventually be killed by Github actions.
attempt_delay: 300000 # 5 min
attempt_limit: 2
command: |
nix develop --command -- hi run "^${{ inputs.test }}$" \
--timeout=120m \
${{ inputs.postgres_flag }}
- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
if: always() && steps.changed-files.outputs.files == 'true'
with:
name: ${{ inputs.database_name }}-${{ inputs.test }}-logs
name: ${{ inputs.database_name }}-${{ steps.sanitize.outputs.name }}-logs
path: "control_logs/*/*.log"
- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
if: always() && steps.changed-files.outputs.files == 'true'
- uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
if: always()
with:
name: ${{ inputs.database_name }}-${{ inputs.test }}-archives
path: "control_logs/*/*.tar"
name: ${{ inputs.database_name }}-${{ steps.sanitize.outputs.name }}-artifacts
path: control_logs/
- name: Setup a blocking tmux session
if: ${{ env.HAS_TAILSCALE_SECRET }}
uses: alexellis/block-with-tmux-action@master

View File

@@ -10,7 +10,7 @@ jobs:
golangci-lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
fetch-depth: 2
- name: Get changed files
@@ -24,13 +24,12 @@ jobs:
- '**/*.go'
- 'integration_test/'
- 'config-example.yaml'
- uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
- uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
if: steps.changed-files.outputs.files == 'true'
- uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
if: steps.changed-files.outputs.files == 'true'
with:
primary-key:
nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
'**/flake.lock') }}
restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}
@@ -38,12 +37,15 @@ jobs:
if: steps.changed-files.outputs.files == 'true'
run: nix develop --command -- golangci-lint run
--new-from-rev=${{github.event.pull_request.base.sha}}
--format=colored-line-number
--output.text.path=stdout
--output.text.print-linter-name
--output.text.print-issued-lines
--output.text.colors
prettier-lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
fetch-depth: 2
- name: Get changed files
@@ -62,13 +64,12 @@ jobs:
- '**/*.css'
- '**/*.scss'
- '**/*.html'
- uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
- uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
if: steps.changed-files.outputs.files == 'true'
- uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
if: steps.changed-files.outputs.files == 'true'
with:
primary-key:
nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
'**/flake.lock') }}
restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}
@@ -80,12 +81,11 @@ jobs:
proto-lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
- uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
with:
primary-key:
nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
'**/flake.lock') }}
restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}

View File

@@ -0,0 +1,28 @@
name: Needs More Info - Post Comment
on:
issues:
types: [labeled]
jobs:
post-comment:
if: >-
github.event.label.name == 'needs-more-info' &&
github.repository == 'juanfont/headscale'
runs-on: ubuntu-latest
permissions:
issues: write
contents: read
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
sparse-checkout: .github/label-response/needs-more-info.md
sparse-checkout-cone-mode: false
- name: Post instruction comment
run: gh issue comment "$NUMBER" --body-file .github/label-response/needs-more-info.md
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GH_REPO: ${{ github.repository }}
NUMBER: ${{ github.event.issue.number }}

View File

@@ -0,0 +1,98 @@
name: Needs More Info - Timer
on:
schedule:
- cron: "0 0 * * *" # Daily at midnight UTC
issue_comment:
types: [created]
workflow_dispatch:
jobs:
# When a non-bot user comments on a needs-more-info issue, remove the label.
remove-label-on-response:
if: >-
github.repository == 'juanfont/headscale' &&
github.event_name == 'issue_comment' &&
github.event.comment.user.type != 'Bot' &&
contains(github.event.issue.labels.*.name, 'needs-more-info')
runs-on: ubuntu-latest
permissions:
issues: write
steps:
- name: Remove needs-more-info label
run: gh issue edit "$NUMBER" --remove-label needs-more-info
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GH_REPO: ${{ github.repository }}
NUMBER: ${{ github.event.issue.number }}
# On schedule, close issues that have had no human response for 3 days.
close-stale:
if: >-
github.repository == 'juanfont/headscale' &&
github.event_name != 'issue_comment'
runs-on: ubuntu-latest
permissions:
issues: write
steps:
- uses: hustcer/setup-nu@920172d92eb04671776f3ba69d605d3b09351c30 # v3.22
with:
version: "*"
- name: Close stale needs-more-info issues
shell: nu {0}
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GH_REPO: ${{ github.repository }}
run: |
let issues = (gh issue list
--repo $env.GH_REPO
--label "needs-more-info"
--state open
--json number
| from json)
for issue in $issues {
let number = $issue.number
print $"Checking issue #($number)"
# Find when needs-more-info was last added
let events = (gh api $"repos/($env.GH_REPO)/issues/($number)/events"
--paginate | from json | flatten)
let label_event = ($events
| where event == "labeled" and label.name == "needs-more-info"
| last)
let label_added_at = ($label_event.created_at | into datetime)
# Check for non-bot comments after the label was added
let comments = (gh api $"repos/($env.GH_REPO)/issues/($number)/comments"
--paginate | from json | flatten)
let human_responses = ($comments
| where user.type != "Bot"
| where { ($in.created_at | into datetime) > $label_added_at })
if ($human_responses | length) > 0 {
print $" Human responded, removing label"
gh issue edit $number --repo $env.GH_REPO --remove-label needs-more-info
continue
}
# Check if 3 days have passed
let elapsed = (date now) - $label_added_at
if $elapsed < 3day {
print $" Only ($elapsed | format duration day) elapsed, skipping"
continue
}
print $" No response for ($elapsed | format duration day), closing"
let message = [
"This issue has been automatically closed because no additional information was provided within 3 days."
""
"If you have the requested information, please open a new issue and include the debug information requested above."
""
"Thank you for your understanding."
] | str join "\n"
gh issue comment $number --repo $env.GH_REPO --body $message
gh issue close $number --repo $env.GH_REPO --reason "not planned"
gh issue edit $number --repo $env.GH_REPO --remove-label needs-more-info
}

55
.github/workflows/nix-module-test.yml vendored Normal file
View File

@@ -0,0 +1,55 @@
name: NixOS Module Tests
on:
push:
branches:
- main
pull_request:
branches:
- main
concurrency:
group: ${{ github.workflow }}-$${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
nix-module-check:
runs-on: ubuntu-latest
permissions:
contents: read
steps:
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
fetch-depth: 2
- name: Get changed files
id: changed-files
uses: dorny/paths-filter@de90cc6fb38fc0963ad72b210f1f284cd68cea36 # v3.0.2
with:
filters: |
nix:
- 'nix/**'
- 'flake.nix'
- 'flake.lock'
go:
- 'go.*'
- '**/*.go'
- 'cmd/**'
- 'hscontrol/**'
- uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
if: steps.changed-files.outputs.nix == 'true' || steps.changed-files.outputs.go == 'true'
- uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
if: steps.changed-files.outputs.nix == 'true' || steps.changed-files.outputs.go == 'true'
with:
primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
'**/flake.lock') }}
restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}
- name: Run NixOS module tests
if: steps.changed-files.outputs.nix == 'true' || steps.changed-files.outputs.go == 'true'
run: |
echo "Running NixOS module integration test..."
nix build .#checks.x86_64-linux.headscale -L

View File

@@ -13,28 +13,46 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
fetch-depth: 0
- name: Pin Docker to v28 (avoid v29 breaking changes)
run: |
# Docker 29 breaks docker build via Go client libraries and
# docker load/save with certain tarball formats.
# Pin to Docker 28.x until our tooling is updated.
# https://github.com/actions/runner-images/issues/13474
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
| sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" \
| sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update -qq
VERSION=$(apt-cache madison docker-ce | grep '28\.5' | head -1 | awk '{print $3}')
sudo apt-get install -y --allow-downgrades \
"docker-ce=${VERSION}" "docker-ce-cli=${VERSION}"
sudo systemctl restart docker
docker version
- name: Login to DockerHub
uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
uses: docker/login-action@5e57cd118135c172c3672efd75eb46360885c0ef # v3.6.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Login to GHCR
uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
uses: docker/login-action@5e57cd118135c172c3672efd75eb46360885c0ef # v3.6.0
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
- uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
- uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
with:
primary-key:
nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
'**/flake.lock') }}
restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}

View File

@@ -12,18 +12,16 @@ jobs:
issues: write
pull-requests: write
steps:
- uses: actions/stale@5bef64f19d7facfb25b37b414482c7164d639639 # v9.1.0
- uses: actions/stale@997185467fa4f803885201cee163a9f38240193d # v10.1.1
with:
days-before-issue-stale: 90
days-before-issue-close: 7
stale-issue-label: "stale"
stale-issue-message:
"This issue is stale because it has been open for 90 days with no
stale-issue-message: "This issue is stale because it has been open for 90 days with no
activity."
close-issue-message:
"This issue was closed because it has been inactive for 14 days
close-issue-message: "This issue was closed because it has been inactive for 14 days
since being marked as stale."
days-before-pr-stale: -1
days-before-pr-close: -1
exempt-issue-labels: "no-stale-bot"
exempt-issue-labels: "no-stale-bot,needs-more-info"
repo-token: ${{ secrets.GITHUB_TOKEN }}

30
.github/workflows/support-request.yml vendored Normal file
View File

@@ -0,0 +1,30 @@
name: Support Request - Close Issue
on:
issues:
types: [labeled]
jobs:
close-support-request:
if: >-
github.event.label.name == 'support-request' &&
github.repository == 'juanfont/headscale'
runs-on: ubuntu-latest
permissions:
issues: write
contents: read
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
sparse-checkout: .github/label-response/support-request.md
sparse-checkout-cone-mode: false
- name: Post comment and close issue
run: |
gh issue comment "$NUMBER" --body-file .github/label-response/support-request.md
gh issue close "$NUMBER" --reason "not planned"
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GH_REPO: ${{ github.repository }}
NUMBER: ${{ github.event.issue.number }}

View File

@@ -7,7 +7,154 @@ concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
# build: Builds binaries and Docker images once, uploads as artifacts for reuse.
# build-postgres: Pulls postgres image separately to avoid Docker Hub rate limits.
# sqlite: Runs all integration tests with SQLite backend.
# postgres: Runs a subset of tests with PostgreSQL to verify database compatibility.
build:
runs-on: ubuntu-24.04-arm
outputs:
files-changed: ${{ steps.changed-files.outputs.files }}
steps:
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
fetch-depth: 2
- name: Get changed files
id: changed-files
uses: dorny/paths-filter@de90cc6fb38fc0963ad72b210f1f284cd68cea36 # v3.0.2
with:
filters: |
files:
- '*.nix'
- 'go.*'
- '**/*.go'
- 'integration/**'
- 'config-example.yaml'
- '.github/workflows/test-integration.yaml'
- '.github/workflows/integration-test-template.yml'
- 'Dockerfile.*'
- uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
if: steps.changed-files.outputs.files == 'true'
- uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
if: steps.changed-files.outputs.files == 'true'
with:
primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix', '**/flake.lock') }}
restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}
- name: Build binaries and warm Go cache
if: steps.changed-files.outputs.files == 'true'
run: |
# Build all Go binaries in one nix shell to maximize cache reuse
nix develop --command -- bash -c '
go build -o hi ./cmd/hi
CGO_ENABLED=0 GOOS=linux go build -o headscale ./cmd/headscale
# Build integration test binary to warm the cache with all dependencies
go test -c ./integration -o /dev/null 2>/dev/null || true
'
- name: Upload hi binary
if: steps.changed-files.outputs.files == 'true'
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
with:
name: hi-binary
path: hi
retention-days: 10
- name: Package Go cache
if: steps.changed-files.outputs.files == 'true'
run: |
# Package Go module cache and build cache
tar -czf go-cache.tar.gz -C ~ go .cache/go-build
- name: Upload Go cache
if: steps.changed-files.outputs.files == 'true'
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
with:
name: go-cache
path: go-cache.tar.gz
retention-days: 10
- name: Pin Docker to v28 (avoid v29 breaking changes)
if: steps.changed-files.outputs.files == 'true'
run: |
# Docker 29 breaks docker build via Go client libraries and
# docker load/save with certain tarball formats.
# Pin to Docker 28.x until our tooling is updated.
# https://github.com/actions/runner-images/issues/13474
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
| sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" \
| sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update -qq
VERSION=$(apt-cache madison docker-ce | grep '28\.5' | head -1 | awk '{print $3}')
sudo apt-get install -y --allow-downgrades \
"docker-ce=${VERSION}" "docker-ce-cli=${VERSION}"
sudo systemctl restart docker
docker version
- name: Build headscale image
if: steps.changed-files.outputs.files == 'true'
run: |
docker build \
--file Dockerfile.integration-ci \
--tag headscale:${{ github.sha }} \
.
docker save headscale:${{ github.sha }} | gzip > headscale-image.tar.gz
- name: Build tailscale HEAD image
if: steps.changed-files.outputs.files == 'true'
run: |
docker build \
--file Dockerfile.tailscale-HEAD \
--tag tailscale-head:${{ github.sha }} \
.
docker save tailscale-head:${{ github.sha }} | gzip > tailscale-head-image.tar.gz
- name: Upload headscale image
if: steps.changed-files.outputs.files == 'true'
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
with:
name: headscale-image
path: headscale-image.tar.gz
retention-days: 10
- name: Upload tailscale HEAD image
if: steps.changed-files.outputs.files == 'true'
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
with:
name: tailscale-head-image
path: tailscale-head-image.tar.gz
retention-days: 10
build-postgres:
runs-on: ubuntu-24.04-arm
needs: build
if: needs.build.outputs.files-changed == 'true'
steps:
- name: Pin Docker to v28 (avoid v29 breaking changes)
run: |
# Docker 29 breaks docker build via Go client libraries and
# docker load/save with certain tarball formats.
# Pin to Docker 28.x until our tooling is updated.
# https://github.com/actions/runner-images/issues/13474
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
| sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" \
| sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update -qq
VERSION=$(apt-cache madison docker-ce | grep '28\.5' | head -1 | awk '{print $3}')
sudo apt-get install -y --allow-downgrades \
"docker-ce=${VERSION}" "docker-ce-cli=${VERSION}"
sudo systemctl restart docker
docker version
- name: Pull and save postgres image
run: |
docker pull postgres:latest
docker tag postgres:latest postgres:${{ github.sha }}
docker save postgres:${{ github.sha }} | gzip > postgres-image.tar.gz
- name: Upload postgres image
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
with:
name: postgres-image
path: postgres-image.tar.gz
retention-days: 10
sqlite:
needs: build
if: needs.build.outputs.files-changed == 'true'
strategy:
fail-fast: false
matrix:
@@ -23,28 +170,48 @@ jobs:
- TestPolicyUpdateWhileRunningWithCLIInDatabase
- TestACLAutogroupMember
- TestACLAutogroupTagged
- TestACLAutogroupSelf
- TestACLPolicyPropagationOverTime
- TestACLTagPropagation
- TestACLTagPropagationPortSpecific
- TestACLGroupWithUnknownUser
- TestACLGroupAfterUserDeletion
- TestACLGroupDeletionExactReproduction
- TestACLDynamicUnknownUserAddition
- TestACLDynamicUnknownUserRemoval
- TestAPIAuthenticationBypass
- TestAPIAuthenticationBypassCurl
- TestGRPCAuthenticationBypass
- TestCLIWithConfigAuthenticationBypass
- TestAuthKeyLogoutAndReloginSameUser
- TestAuthKeyLogoutAndReloginNewUser
- TestAuthKeyLogoutAndReloginSameUserExpiredKey
- TestAuthKeyDeleteKey
- TestAuthKeyLogoutAndReloginRoutesPreserved
- TestOIDCAuthenticationPingAll
- TestOIDCExpireNodesBasedOnTokenExpiry
- TestOIDC024UserCreation
- TestOIDCAuthenticationWithPKCE
- TestOIDCReloginSameNodeNewUser
- TestOIDCFollowUpUrl
- TestOIDCMultipleOpenedLoginUrls
- TestOIDCReloginSameNodeSameUser
- TestOIDCExpiryAfterRestart
- TestOIDCACLPolicyOnJoin
- TestOIDCReloginSameUserRoutesPreserved
- TestAuthWebFlowAuthenticationPingAll
- TestAuthWebFlowLogoutAndRelogin
- TestAuthWebFlowLogoutAndReloginSameUser
- TestAuthWebFlowLogoutAndReloginNewUser
- TestUserCommand
- TestPreAuthKeyCommand
- TestPreAuthKeyCommandWithoutExpiry
- TestPreAuthKeyCommandReusableEphemeral
- TestPreAuthKeyCorrectUserLoggedInCommand
- TestTaggedNodesCLIOutput
- TestApiKeyCommand
- TestNodeTagCommand
- TestNodeAdvertiseTagCommand
- TestNodeCommand
- TestNodeExpireCommand
- TestNodeRenameCommand
- TestNodeMoveCommand
- TestPolicyCommand
- TestPolicyBrokenConfigCommand
- TestDERPVerifyEndpoint
@@ -61,17 +228,27 @@ jobs:
- TestTaildrop
- TestUpdateHostnameFromClient
- TestExpireNode
- TestSetNodeExpiryInFuture
- TestDisableNodeExpiry
- TestNodeOnlineStatus
- TestPingAllByIPManyUpDown
- Test2118DeletingOnlineNodePanics
- TestGrantCapRelay
- TestGrantCapDrive
- TestEnablingRoutes
- TestHASubnetRouterFailover
- TestSubnetRouteACL
- TestEnablingExitRoutes
- TestSubnetRouterMultiNetwork
- TestSubnetRouterMultiNetworkExitNode
- TestAutoApproveMultiNetwork
- TestAutoApproveMultiNetwork/authkey-tag.*
- TestAutoApproveMultiNetwork/authkey-user.*
- TestAutoApproveMultiNetwork/authkey-group.*
- TestAutoApproveMultiNetwork/webauth-tag.*
- TestAutoApproveMultiNetwork/webauth-user.*
- TestAutoApproveMultiNetwork/webauth-group.*
- TestSubnetRouteACLFiltering
- TestGrantViaSubnetSteering
- TestHeadscale
- TestTailscaleNodesJoiningHeadcale
- TestSSHOneUserToAll
@@ -79,12 +256,55 @@ jobs:
- TestSSHNoSSHConfigured
- TestSSHIsBlockedInACL
- TestSSHUserOnlyIsolation
- TestSSHAutogroupSelf
- TestSSHOneUserToOneCheckModeCLI
- TestSSHOneUserToOneCheckModeOIDC
- TestSSHCheckModeUnapprovedTimeout
- TestSSHCheckModeCheckPeriodCLI
- TestSSHCheckModeAutoApprove
- TestSSHCheckModeNegativeCLI
- TestSSHLocalpart
- TestTagsAuthKeyWithTagRequestDifferentTag
- TestTagsAuthKeyWithTagNoAdvertiseFlag
- TestTagsAuthKeyWithTagCannotAddViaCLI
- TestTagsAuthKeyWithTagCannotChangeViaCLI
- TestTagsAuthKeyWithTagAdminOverrideReauthPreserves
- TestTagsAuthKeyWithTagCLICannotModifyAdminTags
- TestTagsAuthKeyWithoutTagCannotRequestTags
- TestTagsAuthKeyWithoutTagRegisterNoTags
- TestTagsAuthKeyWithoutTagCannotAddViaCLI
- TestTagsAuthKeyWithoutTagCLINoOpAfterAdminWithReset
- TestTagsAuthKeyWithoutTagCLINoOpAfterAdminWithEmptyAdvertise
- TestTagsAuthKeyWithoutTagCLICannotReduceAdminMultiTag
- TestTagsUserLoginOwnedTagAtRegistration
- TestTagsUserLoginNonExistentTagAtRegistration
- TestTagsUserLoginUnownedTagAtRegistration
- TestTagsUserLoginAddTagViaCLIReauth
- TestTagsUserLoginRemoveTagViaCLIReauth
- TestTagsUserLoginCLINoOpAfterAdminAssignment
- TestTagsUserLoginCLICannotRemoveAdminTags
- TestTagsAuthKeyWithTagRequestNonExistentTag
- TestTagsAuthKeyWithTagRequestUnownedTag
- TestTagsAuthKeyWithoutTagRequestNonExistentTag
- TestTagsAuthKeyWithoutTagRequestUnownedTag
- TestTagsAdminAPICannotSetNonExistentTag
- TestTagsAdminAPICanSetUnownedTag
- TestTagsAdminAPICannotRemoveAllTags
- TestTagsIssue2978ReproTagReplacement
- TestTagsAdminAPICannotSetInvalidFormat
- TestTagsUserLoginReauthWithEmptyTagsRemovesAllTags
- TestTagsAuthKeyWithoutUserInheritsTags
- TestTagsAuthKeyWithoutUserRejectsAdvertisedTags
- TestTagsAuthKeyConvertToUserViaCLIRegister
uses: ./.github/workflows/integration-test-template.yml
secrets: inherit
with:
test: ${{ matrix.test }}
postgres_flag: "--postgres=0"
database_name: "sqlite"
postgres:
needs: [build, build-postgres]
if: needs.build.outputs.files-changed == 'true'
strategy:
fail-fast: false
matrix:
@@ -95,6 +315,7 @@ jobs:
- TestPingAllByIPManyUpDown
- TestSubnetRouterMultiNetwork
uses: ./.github/workflows/integration-test-template.yml
secrets: inherit
with:
test: ${{ matrix.test }}
postgres_flag: "--postgres=1"

View File

@@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
fetch-depth: 2
@@ -27,13 +27,12 @@ jobs:
- 'integration_test/'
- 'config-example.yaml'
- uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
- uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
if: steps.changed-files.outputs.files == 'true'
- uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
if: steps.changed-files.outputs.files == 'true'
with:
primary-key:
nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
'**/flake.lock') }}
restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}

7
.gitignore vendored
View File

@@ -1,6 +1,10 @@
ignored/
tailscale/
.vscode/
.claude/
logs/
*.prof
# Binaries for programs and plugins
*.exe
@@ -25,6 +29,7 @@ config*.yaml
!config-example.yaml
derp.yaml
*.hujson
!hscontrol/policy/v2/testdata/*/*.hujson
*.key
/db.sqlite
*.sqlite3
@@ -47,8 +52,6 @@ integration_test/etc/config.dump.yaml
__debug_bin
node_modules/
package-lock.json
package.json

View File

@@ -7,6 +7,7 @@ linters:
- depguard
- dupl
- exhaustruct
- funcorder
- funlen
- gochecknoglobals
- gochecknoinits
@@ -17,6 +18,7 @@ linters:
- lll
- maintidx
- makezero
- mnd
- musttag
- nestif
- nolintlint
@@ -28,6 +30,32 @@ linters:
- wrapcheck
- wsl
settings:
forbidigo:
forbid:
# Forbid time.Sleep everywhere with context-appropriate alternatives
- pattern: 'time\.Sleep'
msg: >-
time.Sleep is forbidden.
In tests: use assert.EventuallyWithT for polling/waiting patterns.
In production code: use a backoff strategy (e.g., cenkalti/backoff) or proper synchronization primitives.
# Forbid inline string literals in zerolog field methods - use zf.* constants
- pattern: '\.(Str|Int|Int8|Int16|Int32|Int64|Uint|Uint8|Uint16|Uint32|Uint64|Float32|Float64|Bool|Dur|Time|TimeDiff|Strs|Ints|Uints|Floats|Bools|Any|Interface)\("[^"]+"'
msg: >-
Use zf.* constants for zerolog field names instead of string literals.
Import "github.com/juanfont/headscale/hscontrol/util/zlog/zf" and use
constants like zf.NodeID, zf.UserName, etc. Add new constants to
hscontrol/util/zlog/zf/fields.go if needed.
# Forbid ptr.To - use Go 1.26 new(expr) instead
- pattern: 'ptr\.To\('
msg: >-
ptr.To is forbidden. Use Go 1.26's new(expr) syntax instead.
Example: ptr.To(value) → new(value)
# Forbid tsaddr.SortPrefixes - use slices.SortFunc with netip.Prefix.Compare
- pattern: 'tsaddr\.SortPrefixes'
msg: >-
tsaddr.SortPrefixes is forbidden. Use Go 1.26's netip.Prefix.Compare instead.
Example: slices.SortFunc(prefixes, netip.Prefix.Compare)
analyze-types: true
gocritic:
disabled-checks:
- appendAssign

View File

@@ -2,12 +2,16 @@
version: 2
before:
hooks:
- go mod tidy -compat=1.24
- go mod tidy -compat=1.26
- go mod vendor
release:
prerelease: auto
draft: true
header: |
## Upgrade
Please follow the steps outlined in the [upgrade guide](https://headscale.net/stable/setup/upgrade/) to update your existing Headscale installation.
builds:
- id: headscale
@@ -19,20 +23,10 @@ builds:
- darwin_amd64
- darwin_arm64
- freebsd_amd64
- linux_386
- linux_amd64
- linux_arm64
- linux_arm_5
- linux_arm_6
- linux_arm_7
flags:
- -mod=readonly
ldflags:
- -s -w
- -X github.com/juanfont/headscale/hscontrol/types.Version={{ .Version }}
- -X github.com/juanfont/headscale/hscontrol/types.GitCommitHash={{ .Commit }}
tags:
- ts2019
archives:
- id: golang-cross
@@ -106,16 +100,14 @@ kos:
# bare tells KO to only use the repository
# for tagging and naming the container.
bare: true
base_image: gcr.io/distroless/base-debian12
base_image: gcr.io/distroless/base-debian13
build: headscale
main: ./cmd/headscale
env:
- CGO_ENABLED=0
platforms:
- linux/amd64
- linux/386
- linux/arm64
- linux/arm/v7
tags:
- "{{ if not .Prerelease }}latest{{ end }}"
- "{{ if not .Prerelease }}{{ .Major }}.{{ .Minor }}.{{ .Patch }}{{ end }}"
@@ -128,6 +120,8 @@ kos:
- "{{ .Tag }}"
- '{{ trimprefix .Tag "v" }}'
- "sha-{{ .ShortCommit }}"
creation_time: "{{.CommitTimestamp}}"
ko_data_creation_time: "{{.CommitTimestamp}}"
- id: ghcr-debug
repositories:
@@ -135,16 +129,14 @@ kos:
- headscale/headscale
bare: true
base_image: gcr.io/distroless/base-debian12:debug
base_image: gcr.io/distroless/base-debian13:debug
build: headscale
main: ./cmd/headscale
env:
- CGO_ENABLED=0
platforms:
- linux/amd64
- linux/386
- linux/arm64
- linux/arm/v7
tags:
- "{{ if not .Prerelease }}latest-debug{{ end }}"
- "{{ if not .Prerelease }}{{ .Major }}.{{ .Minor }}.{{ .Patch }}-debug{{ end }}"

34
.mcp.json Normal file
View File

@@ -0,0 +1,34 @@
{
"mcpServers": {
"claude-code-mcp": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@steipete/claude-code-mcp@latest"],
"env": {}
},
"sequential-thinking": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-sequential-thinking"],
"env": {}
},
"nixos": {
"type": "stdio",
"command": "uvx",
"args": ["mcp-nixos"],
"env": {}
},
"context7": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@upstash/context7-mcp"],
"env": {}
},
"git": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@cyanheads/git-mcp-server"],
"env": {}
}
}
}

2
.mdformat.toml Normal file
View File

@@ -0,0 +1,2 @@
[plugin.mkdocs]
align_semantic_breaks_in_lists = true

62
.pre-commit-config.yaml Normal file
View File

@@ -0,0 +1,62 @@
# prek/pre-commit configuration for headscale
# See: https://prek.j178.dev/quickstart/
# See: https://prek.j178.dev/builtin/
# Global exclusions - ignore generated code
exclude: ^gen/
repos:
# Built-in hooks from pre-commit/pre-commit-hooks
# prek will use fast-path optimized versions automatically
# See: https://prek.j178.dev/builtin/
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v6.0.0
hooks:
- id: check-added-large-files
- id: check-case-conflict
- id: check-executables-have-shebangs
- id: check-json
- id: check-merge-conflict
- id: check-symlinks
- id: check-toml
- id: check-xml
- id: check-yaml
- id: detect-private-key
- id: end-of-file-fixer
- id: fix-byte-order-marker
- id: mixed-line-ending
- id: trailing-whitespace
# Local hooks for project-specific tooling
- repo: local
hooks:
# nixpkgs-fmt for Nix files
- id: nixpkgs-fmt
name: nixpkgs-fmt
entry: nixpkgs-fmt
language: system
files: \.nix$
# Prettier for formatting
- id: prettier
name: prettier
entry: prettier --write --list-different
language: system
exclude: ^docs/
types_or: [javascript, jsx, ts, tsx, yaml, json, toml, html, css, scss, sass, markdown]
# mdformat for docs
- id: mdformat
name: mdformat
entry: mdformat
language: system
types_or: [markdown]
files: ^docs/
# golangci-lint for Go code quality
- id: golangci-lint
name: golangci-lint
entry: nix develop --command -- golangci-lint run --new-from-rev=HEAD~1 --timeout=5m --fix
language: system
types: [go]
pass_filenames: false

View File

@@ -1,5 +1,2 @@
.github/workflows/test-integration-v2*
docs/about/features.md
docs/ref/configuration.md
docs/ref/oidc.md
docs/ref/remote-cli.md
docs/

1051
AGENTS.md Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

396
CLAUDE.md
View File

@@ -1,395 +1 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Overview
Headscale is an open-source implementation of the Tailscale control server written in Go. It provides self-hosted coordination for Tailscale networks (tailnets), managing node registration, IP allocation, policy enforcement, and DERP routing.
## Development Commands
### Quick Setup
```bash
# Recommended: Use Nix for dependency management
nix develop
# Full development workflow
make dev # runs fmt + lint + test + build
```
### Essential Commands
```bash
# Build headscale binary
make build
# Run tests
make test
go test ./... # All unit tests
go test -race ./... # With race detection
# Run specific integration test
go run ./cmd/hi run "TestName" --postgres
# Code formatting and linting
make fmt # Format all code (Go, docs, proto)
make lint # Lint all code (Go, proto)
make fmt-go # Format Go code only
make lint-go # Lint Go code only
# Protocol buffer generation (after modifying proto/)
make generate
# Clean build artifacts
make clean
```
### Integration Testing
```bash
# Use the hi (Headscale Integration) test runner
go run ./cmd/hi doctor # Check system requirements
go run ./cmd/hi run "TestPattern" # Run specific test
go run ./cmd/hi run "TestPattern" --postgres # With PostgreSQL backend
# Test artifacts are saved to control_logs/ with logs and debug data
```
## Project Structure & Architecture
### Top-Level Organization
```
headscale/
├── cmd/ # Command-line applications
│ ├── headscale/ # Main headscale server binary
│ └── hi/ # Headscale Integration test runner
├── hscontrol/ # Core control plane logic
├── integration/ # End-to-end Docker-based tests
├── proto/ # Protocol buffer definitions
├── gen/ # Generated code (protobuf)
├── docs/ # Documentation
└── packaging/ # Distribution packaging
```
### Core Packages (`hscontrol/`)
**Main Server (`hscontrol/`)**
- `app.go`: Application setup, dependency injection, server lifecycle
- `handlers.go`: HTTP/gRPC API endpoints for management operations
- `grpcv1.go`: gRPC service implementation for headscale API
- `poll.go`: **Critical** - Handles Tailscale MapRequest/MapResponse protocol
- `noise.go`: Noise protocol implementation for secure client communication
- `auth.go`: Authentication flows (web, OIDC, command-line)
- `oidc.go`: OpenID Connect integration for user authentication
**State Management (`hscontrol/state/`)**
- `state.go`: Central coordinator for all subsystems (database, policy, IP allocation, DERP)
- `node_store.go`: **Performance-critical** - In-memory cache with copy-on-write semantics
- Thread-safe operations with deadlock detection
- Coordinates between database persistence and real-time operations
**Database Layer (`hscontrol/db/`)**
- `db.go`: Database abstraction, GORM setup, migration management
- `node.go`: Node lifecycle, registration, expiration, IP assignment
- `users.go`: User management, namespace isolation
- `api_key.go`: API authentication tokens
- `preauth_keys.go`: Pre-authentication keys for automated node registration
- `ip.go`: IP address allocation and management
- `policy.go`: Policy storage and retrieval
- Schema migrations in `schema.sql` with extensive test data coverage
**Policy Engine (`hscontrol/policy/`)**
- `policy.go`: Core ACL evaluation logic, HuJSON parsing
- `v2/`: Next-generation policy system with improved filtering
- `matcher/`: ACL rule matching and evaluation engine
- Determines peer visibility, route approval, and network access rules
- Supports both file-based and database-stored policies
**Network Management (`hscontrol/`)**
- `derp/`: DERP (Designated Encrypted Relay for Packets) server implementation
- NAT traversal when direct connections fail
- Fallback relay for firewall-restricted environments
- `mapper/`: Converts internal Headscale state to Tailscale's wire protocol format
- `tail.go`: Tailscale-specific data structure generation
- `routes/`: Subnet route management and primary route selection
- `dns/`: DNS record management and MagicDNS implementation
**Utilities & Support (`hscontrol/`)**
- `types/`: Core data structures, configuration, validation
- `util/`: Helper functions for networking, DNS, key management
- `templates/`: Client configuration templates (Apple, Windows, etc.)
- `notifier/`: Event notification system for real-time updates
- `metrics.go`: Prometheus metrics collection
- `capver/`: Tailscale capability version management
### Key Subsystem Interactions
**Node Registration Flow**
1. **Client Connection**: `noise.go` handles secure protocol handshake
2. **Authentication**: `auth.go` validates credentials (web/OIDC/preauth)
3. **State Creation**: `state.go` coordinates IP allocation via `db/ip.go`
4. **Storage**: `db/node.go` persists node, `NodeStore` caches in memory
5. **Network Setup**: `mapper/` generates initial Tailscale network map
**Ongoing Operations**
1. **Poll Requests**: `poll.go` receives periodic client updates
2. **State Updates**: `NodeStore` maintains real-time node information
3. **Policy Application**: `policy/` evaluates ACL rules for peer relationships
4. **Map Distribution**: `mapper/` sends network topology to all affected clients
**Route Management**
1. **Advertisement**: Clients announce routes via `poll.go` Hostinfo updates
2. **Storage**: `db/` persists routes, `NodeStore` caches for performance
3. **Approval**: `policy/` auto-approves routes based on ACL rules
4. **Distribution**: `routes/` selects primary routes, `mapper/` distributes to peers
### Command-Line Tools (`cmd/`)
**Main Server (`cmd/headscale/`)**
- `headscale.go`: CLI parsing, configuration loading, server startup
- Supports daemon mode, CLI operations (user/node management), database operations
**Integration Test Runner (`cmd/hi/`)**
- `main.go`: Test execution framework with Docker orchestration
- `run.go`: Individual test execution with artifact collection
- `doctor.go`: System requirements validation
- `docker.go`: Container lifecycle management
- Essential for validating changes against real Tailscale clients
### Generated & External Code
**Protocol Buffers (`proto/` → `gen/`)**
- Defines gRPC API for headscale management operations
- Client libraries can generate from these definitions
- Run `make generate` after modifying `.proto` files
**Integration Testing (`integration/`)**
- `scenario.go`: Docker test environment setup
- `tailscale.go`: Tailscale client container management
- Individual test files for specific functionality areas
- Real end-to-end validation with network isolation
### Critical Performance Paths
**High-Frequency Operations**
1. **MapRequest Processing** (`poll.go`): Every 15-60 seconds per client
2. **NodeStore Reads** (`node_store.go`): Every operation requiring node data
3. **Policy Evaluation** (`policy/`): On every peer relationship calculation
4. **Route Lookups** (`routes/`): During network map generation
**Database Write Patterns**
- **Frequent**: Node heartbeats, endpoint updates, route changes
- **Moderate**: User operations, policy updates, API key management
- **Rare**: Schema migrations, bulk operations
### Configuration & Deployment
**Configuration** (`hscontrol/types/config.go`)**
- Database connection settings (SQLite/PostgreSQL)
- Network configuration (IP ranges, DNS settings)
- Policy mode (file vs database)
- DERP relay configuration
- OIDC provider settings
**Key Dependencies**
- **GORM**: Database ORM with migration support
- **Tailscale Libraries**: Core networking and protocol code
- **Zerolog**: Structured logging throughout the application
- **Buf**: Protocol buffer toolchain for code generation
### Development Workflow Integration
The architecture supports incremental development:
- **Unit Tests**: Focus on individual packages (`*_test.go` files)
- **Integration Tests**: Validate cross-component interactions
- **Database Tests**: Extensive migration and data integrity validation
- **Policy Tests**: ACL rule evaluation and edge cases
- **Performance Tests**: NodeStore and high-frequency operation validation
## Integration Test System
### Overview
Integration tests use Docker containers running real Tailscale clients against a Headscale server. Tests validate end-to-end functionality including routing, ACLs, node lifecycle, and network coordination.
### Running Integration Tests
**System Requirements**
```bash
# Check if your system is ready
go run ./cmd/hi doctor
```
This verifies Docker, Go, required images, and disk space.
**Test Execution Patterns**
```bash
# Run a single test (recommended for development)
go run ./cmd/hi run "TestSubnetRouterMultiNetwork"
# Run with PostgreSQL backend (for database-heavy tests)
go run ./cmd/hi run "TestExpireNode" --postgres
# Run multiple tests with pattern matching
go run ./cmd/hi run "TestSubnet*"
# Run all integration tests (CI/full validation)
go test ./integration -timeout 30m
```
**Test Categories & Timing**
- **Fast tests** (< 2 min): Basic functionality, CLI operations
- **Medium tests** (2-5 min): Route management, ACL validation
- **Slow tests** (5+ min): Node expiration, HA failover
- **Long-running tests** (10+ min): `TestNodeOnlineStatus` (12 min duration)
### Test Infrastructure
**Docker Setup**
- Headscale server container with configurable database backend
- Multiple Tailscale client containers with different versions
- Isolated networks per test scenario
- Automatic cleanup after test completion
**Test Artifacts**
All test runs save artifacts to `control_logs/TIMESTAMP-ID/`:
```
control_logs/20250713-213106-iajsux/
├── hs-testname-abc123.stderr.log # Headscale server logs
├── hs-testname-abc123.stdout.log
├── hs-testname-abc123.db # Database snapshot
├── hs-testname-abc123_metrics.txt # Prometheus metrics
├── hs-testname-abc123-mapresponses/ # Protocol debug data
├── ts-client-xyz789.stderr.log # Tailscale client logs
├── ts-client-xyz789.stdout.log
└── ts-client-xyz789_status.json # Client status dump
```
### Test Development Guidelines
**Timing Considerations**
Integration tests involve real network operations and Docker container lifecycle:
```go
// ❌ Wrong: Immediate assertions after async operations
client.Execute([]string{"tailscale", "set", "--advertise-routes=10.0.0.0/24"})
nodes, _ := headscale.ListNodes()
require.Len(t, nodes[0].GetAvailableRoutes(), 1) // May fail due to timing
// ✅ Correct: Wait for async operations to complete
client.Execute([]string{"tailscale", "set", "--advertise-routes=10.0.0.0/24"})
require.EventuallyWithT(t, func(c *assert.CollectT) {
nodes, err := headscale.ListNodes()
assert.NoError(c, err)
assert.Len(c, nodes[0].GetAvailableRoutes(), 1)
}, 10*time.Second, 100*time.Millisecond, "route should be advertised")
```
**Common Test Patterns**
- **Route Advertisement**: Use `EventuallyWithT` for route propagation
- **Node State Changes**: Wait for NodeStore synchronization
- **ACL Policy Changes**: Allow time for policy recalculation
- **Network Connectivity**: Use ping tests with retries
**Test Data Management**
```go
// Node identification: Don't assume array ordering
expectedRoutes := map[string]string{"1": "10.33.0.0/16"}
for _, node := range nodes {
nodeIDStr := fmt.Sprintf("%d", node.GetId())
if route, shouldHaveRoute := expectedRoutes[nodeIDStr]; shouldHaveRoute {
// Test the node that should have the route
}
}
```
### Troubleshooting Integration Tests
**Common Failure Patterns**
1. **Timing Issues**: Test assertions run before async operations complete
- **Solution**: Use `EventuallyWithT` with appropriate timeouts
- **Timeout Guidelines**: 3-5s for route operations, 10s for complex scenarios
2. **Infrastructure Problems**: Disk space, Docker issues, network conflicts
- **Check**: `go run ./cmd/hi doctor` for system health
- **Clean**: Remove old test containers and networks
3. **NodeStore Synchronization**: Tests expecting immediate data availability
- **Key Points**: Route advertisements must propagate through poll requests
- **Fix**: Wait for NodeStore updates after Hostinfo changes
4. **Database Backend Differences**: SQLite vs PostgreSQL behavior differences
- **Use**: `--postgres` flag for database-intensive tests
- **Note**: Some timing characteristics differ between backends
**Debugging Failed Tests**
1. **Check test artifacts** in `control_logs/` for detailed logs
2. **Examine MapResponse JSON** files for protocol-level debugging
3. **Review Headscale stderr logs** for server-side error messages
4. **Check Tailscale client status** for network-level issues
**Resource Management**
- Tests require significant disk space (each run ~100MB of logs)
- Docker containers are cleaned up automatically on success
- Failed tests may leave containers running - clean manually if needed
- Use `docker system prune` periodically to reclaim space
### Best Practices for Test Modifications
1. **Always test locally** before committing integration test changes
2. **Use appropriate timeouts** - too short causes flaky tests, too long slows CI
3. **Clean up properly** - ensure tests don't leave persistent state
4. **Handle both success and failure paths** in test scenarios
5. **Document timing requirements** for complex test scenarios
## NodeStore Implementation Details
**Key Insight from Recent Work**: The NodeStore is a critical performance optimization that caches node data in memory while ensuring consistency with the database. When working with route advertisements or node state changes:
1. **Timing Considerations**: Route advertisements need time to propagate from clients to server. Use `require.EventuallyWithT()` patterns in tests instead of immediate assertions.
2. **Synchronization Points**: NodeStore updates happen at specific points like `poll.go:420` after Hostinfo changes. Ensure these are maintained when modifying the polling logic.
3. **Peer Visibility**: The NodeStore's `peersFunc` determines which nodes are visible to each other. Policy-based filtering is separate from monitoring visibility - expired nodes should remain visible for debugging but marked as expired.
## Testing Guidelines
### Integration Test Patterns
```go
// Use EventuallyWithT for async operations
require.EventuallyWithT(t, func(c *assert.CollectT) {
nodes, err := headscale.ListNodes()
assert.NoError(c, err)
// Check expected state
}, 10*time.Second, 100*time.Millisecond, "description")
// Node route checking by actual node properties, not array position
var routeNode *v1.Node
for _, node := range nodes {
if nodeIDStr := fmt.Sprintf("%d", node.GetId()); expectedRoutes[nodeIDStr] != "" {
routeNode = node
break
}
}
```
### Running Problematic Tests
- Some tests require significant time (e.g., `TestNodeOnlineStatus` runs for 12 minutes)
- Infrastructure issues like disk space can cause test failures unrelated to code changes
- Use `--postgres` flag when testing database-heavy scenarios
## Important Notes
- **Dependencies**: Use `nix develop` for consistent toolchain (Go, buf, protobuf tools, linting)
- **Protocol Buffers**: Changes to `proto/` require `make generate` and should be committed separately
- **Code Style**: Enforced via golangci-lint with golines (width 88) and gofumpt formatting
- **Database**: Supports both SQLite (development) and PostgreSQL (production/testing)
- **Integration Tests**: Require Docker and can consume significant disk space
- **Performance**: NodeStore optimizations are critical for scale - be careful with changes to state management
## Debugging Integration Tests
Test artifacts are preserved in `control_logs/TIMESTAMP-ID/` including:
- Headscale server logs (stderr/stdout)
- Tailscale client logs and status
- Database dumps and network captures
- MapResponse JSON files for protocol debugging
When tests fail, check these artifacts first before assuming code issues.
@AGENTS.md

File diff suppressed because it is too large Load Diff

View File

@@ -1,201 +0,0 @@
# CLI Standardization Summary
## Changes Made
### 1. Command Naming Standardization
- **Fixed**: `backfillips``backfill-ips` (with backward compat alias)
- **Fixed**: `dumpConfig``dump-config` (with backward compat alias)
- **Result**: All commands now use kebab-case consistently
### 2. Flag Standardization
#### Node Commands
- **Added**: `--node` flag as primary way to specify nodes
- **Deprecated**: `--identifier` flag (hidden, marked deprecated)
- **Backward Compatible**: Both flags work, `--identifier` shows deprecation warning
- **Smart Lookup Ready**: `--node` accepts strings for future name/hostname/IP lookup
#### User Commands
- **Updated**: User identification flow prepared for `--user` flag
- **Maintained**: Existing `--name` and `--identifier` flags for backward compatibility
### 3. Description Consistency
- **Fixed**: "Api" → "API" throughout
- **Fixed**: Capitalization consistency in short descriptions
- **Fixed**: Removed unnecessary periods from short descriptions
- **Standardized**: "Handle/Manage the X of Headscale" pattern
### 4. Type Consistency
- **Standardized**: Node IDs use `uint64` consistently
- **Maintained**: Backward compatibility with existing flag types
## Current Status
### ✅ Completed
- Command naming (kebab-case)
- Flag deprecation and aliasing
- Description standardization
- Backward compatibility preservation
- Helper functions for flag processing
- **SMART LOOKUP IMPLEMENTATION**:
- Enhanced `ListNodesRequest` proto with ID, name, hostname, IP filters
- Implemented smart filtering in `ListNodes` gRPC method
- Added CLI smart lookup functions for nodes and users
- Single match validation with helpful error messages
- Automatic detection: ID (numeric) vs IP vs name/hostname/email
### ✅ Smart Lookup Features
- **Node Lookup**: By ID, hostname, or IP address
- **User Lookup**: By ID, username, or email address
- **Single Match Enforcement**: Errors if 0 or >1 matches found
- **Helpful Error Messages**: Shows all matches when ambiguous
- **Full Backward Compatibility**: All existing flags still work
- **Enhanced List Commands**: Both `nodes list` and `users list` support all filter types
## Breaking Changes
**None.** All changes maintain full backward compatibility through flag aliases and deprecation warnings.
## Implementation Details
### Smart Lookup Algorithm
1. **Input Detection**:
```go
if numeric && > 0 -> treat as ID
else if contains "@" -> treat as email (users only)
else if valid IP address -> treat as IP (nodes only)
else -> treat as name/hostname
```
2. **gRPC Filtering**:
- Uses enhanced `ListNodes`/`ListUsers` with specific filters
- Server-side filtering for optimal performance
- Single transaction per lookup
3. **Match Validation**:
- Exactly 1 match: Return ID
- 0 matches: Error with "not found" message
- >1 matches: Error listing all matches for disambiguation
### Enhanced Proto Definitions
```protobuf
message ListNodesRequest {
string user = 1; // existing
uint64 id = 2; // new: filter by ID
string name = 3; // new: filter by hostname
string hostname = 4; // new: alias for name
repeated string ip_addresses = 5; // new: filter by IPs
}
```
### Future Enhancements
- **Fuzzy Matching**: Partial name matching with confirmation
- **Recently Used**: Cache recently accessed nodes/users
- **Tab Completion**: Shell completion for names/hostnames
- **Bulk Operations**: Multi-select with pattern matching
## Migration Path for Users
### Now Available (Current Release)
```bash
# Old way (still works, shows deprecation warning)
headscale nodes expire --identifier 123
# New way with smart lookup:
headscale nodes expire --node 123 # by ID
headscale nodes expire --node "my-laptop" # by hostname
headscale nodes expire --node "100.64.0.1" # by Tailscale IP
headscale nodes expire --node "192.168.1.100" # by real IP
# User operations:
headscale users destroy --user 123 # by ID
headscale users destroy --user "alice" # by username
headscale users destroy --user "alice@company.com" # by email
# Enhanced list commands with filtering:
headscale nodes list --node "laptop" # filter nodes by name
headscale nodes list --ip "100.64.0.1" # filter nodes by IP
headscale nodes list --user "alice" # filter nodes by user
headscale users list --user "alice" # smart lookup user
headscale users list --email "@company.com" # filter by email domain
headscale users list --name "alice" # filter by exact name
# Error handling examples:
headscale nodes expire --node "laptop"
# Error: multiple nodes found matching 'laptop': ID=1 name=laptop-alice, ID=2 name=laptop-bob
headscale nodes expire --node "nonexistent"
# Error: no node found matching 'nonexistent'
```
## Command Structure Overview
```
headscale [global-flags] <command> [command-flags] <subcommand> [subcommand-flags] [args]
Global Flags:
--config, -c config file path
--output, -o output format (json, yaml, json-line)
--force disable prompts
Commands:
├── serve
├── version
├── config-test
├── dump-config (alias: dumpConfig)
├── mockoidc
├── generate/
│ └── private-key
├── nodes/
│ ├── list (--user, --tags, --columns)
│ ├── register (--user, --key)
│ ├── list-routes (--node)
│ ├── expire (--node)
│ ├── rename (--node) <new-name>
│ ├── delete (--node)
│ ├── move (--node, --user)
│ ├── tag (--node, --tags)
│ ├── approve-routes (--node, --routes)
│ └── backfill-ips (alias: backfillips)
├── users/
│ ├── create <name> (--display-name, --email, --picture-url)
│ ├── list (--user, --name, --email, --columns)
│ ├── destroy (--user|--name|--identifier)
│ └── rename (--user|--name|--identifier, --new-name)
├── apikeys/
│ ├── list
│ ├── create (--expiration)
│ ├── expire (--prefix)
│ └── delete (--prefix)
├── preauthkeys/
│ ├── list (--user)
│ ├── create (--user, --reusable, --ephemeral, --expiration, --tags)
│ └── expire (--user) <key>
├── policy/
│ ├── get
│ ├── set (--file)
│ └── check (--file)
└── debug/
└── create-node (--name, --user, --key, --route)
```
## Deprecated Flags
All deprecated flags continue to work but show warnings:
- `--identifier` → use `--node` (for node commands) or `--user` (for user commands)
- `--namespace` → use `--user` (already implemented)
- `dumpConfig` → use `dump-config`
- `backfillips` → use `backfill-ips`
## Error Handling
Improved error messages provide clear guidance:
```
Error: node specifier must be a numeric ID (smart lookup by name/hostname/IP not yet implemented)
Error: --node flag is required
Error: --user flag is required
```

View File

@@ -1,6 +1,6 @@
# For testing purposes only
FROM golang:alpine AS build-env
FROM golang:1.26.1-alpine AS build-env
WORKDIR /go/src
@@ -12,7 +12,7 @@ WORKDIR /go/src/tailscale
ARG TARGETARCH
RUN GOARCH=$TARGETARCH go install -v ./cmd/derper
FROM alpine:3.18
FROM alpine:3.22
RUN apk add --no-cache ca-certificates iptables iproute2 ip6tables curl
COPY --from=build-env /go/bin/* /usr/local/bin/

View File

@@ -2,25 +2,43 @@
# and are in no way endorsed by Headscale's maintainers as an
# official nor supported release or distribution.
FROM docker.io/golang:1.24-bookworm
FROM docker.io/golang:1.26.1-trixie AS builder
ARG VERSION=dev
ENV GOPATH /go
WORKDIR /go/src/headscale
RUN apt-get update \
&& apt-get install --no-install-recommends --yes less jq sqlite3 dnsutils \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
RUN mkdir -p /var/run/headscale
# Install delve debugger first - rarely changes, good cache candidate
RUN go install github.com/go-delve/delve/cmd/dlv@latest
# Download dependencies - only invalidated when go.mod/go.sum change
COPY go.mod go.sum /go/src/headscale/
RUN go mod download
# Copy source and build - invalidated on any source change
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go install -a ./cmd/headscale && test -e /go/bin/headscale
# Build debug binary with debug symbols for delve
RUN CGO_ENABLED=0 GOOS=linux go build -gcflags="all=-N -l" -o /go/bin/headscale ./cmd/headscale
# Runtime stage
FROM debian:trixie-slim
RUN apt-get --update install --no-install-recommends --yes \
bash ca-certificates curl dnsutils findutils iproute2 jq less procps python3 sqlite3 \
&& apt-get dist-clean
RUN mkdir -p /var/run/headscale
# Copy binaries from builder
COPY --from=builder /go/bin/headscale /usr/local/bin/headscale
COPY --from=builder /go/bin/dlv /usr/local/bin/dlv
# Copy source code for delve source-level debugging
COPY --from=builder /go/src/headscale /go/src/headscale
WORKDIR /go/src/headscale
# Need to reset the entrypoint or everything will run as a busybox script
ENTRYPOINT []
EXPOSE 8080/tcp
CMD ["headscale"]
EXPOSE 8080/tcp 40000/tcp
CMD ["dlv", "--listen=0.0.0.0:40000", "--headless=true", "--api-version=2", "--accept-multiclient", "exec", "/usr/local/bin/headscale", "--"]

17
Dockerfile.integration-ci Normal file
View File

@@ -0,0 +1,17 @@
# Minimal CI image - expects pre-built headscale binary in build context
# For local development with delve debugging, use Dockerfile.integration instead
FROM debian:trixie-slim
RUN apt-get --update install --no-install-recommends --yes \
bash ca-certificates curl dnsutils findutils iproute2 jq less procps python3 sqlite3 \
&& apt-get dist-clean
RUN mkdir -p /var/run/headscale
# Copy pre-built headscale binary from build context
COPY headscale /usr/local/bin/headscale
ENTRYPOINT []
EXPOSE 8080/tcp
CMD ["/usr/local/bin/headscale"]

View File

@@ -4,7 +4,7 @@
# This Dockerfile is more or less lifted from tailscale/tailscale
# to ensure a similar build process when testing the HEAD of tailscale.
FROM golang:1.24-alpine AS build-env
FROM golang:1.26.1-alpine AS build-env
WORKDIR /go/src
@@ -36,8 +36,10 @@ RUN GOARCH=$TARGETARCH go install -tags="${BUILD_TAGS}" -ldflags="\
-X tailscale.com/version.gitCommitStamp=$VERSION_GIT_HASH" \
-v ./cmd/tailscale ./cmd/tailscaled ./cmd/containerboot
FROM alpine:3.18
RUN apk add --no-cache ca-certificates iptables iproute2 ip6tables curl
FROM alpine:3.22
# Upstream: ca-certificates ip6tables iptables iproute2
# Tests: curl python3 (traceroute via BusyBox)
RUN apk add --no-cache ca-certificates curl ip6tables iptables iproute2 python3
COPY --from=build-env /go/bin/* /usr/local/bin/
# For compat with the previous run.sh, although ideally you should be

View File

@@ -21,7 +21,7 @@ endef
# Source file collections using shell find for better performance
GO_SOURCES := $(shell find . -name '*.go' -not -path './gen/*' -not -path './vendor/*')
PROTO_SOURCES := $(shell find . -name '*.proto' -not -path './gen/*' -not -path './vendor/*')
DOC_SOURCES := $(shell find . \( -name '*.md' -o -name '*.yaml' -o -name '*.yml' -o -name '*.ts' -o -name '*.js' -o -name '*.html' -o -name '*.css' -o -name '*.scss' -o -name '*.sass' \) -not -path './gen/*' -not -path './vendor/*' -not -path './node_modules/*')
PRETTIER_SOURCES := $(shell find . \( -name '*.md' -o -name '*.yaml' -o -name '*.yml' -o -name '*.ts' -o -name '*.js' -o -name '*.html' -o -name '*.css' -o -name '*.scss' -o -name '*.sass' \) -not -path './gen/*' -not -path './vendor/*' -not -path './node_modules/*')
# Default target
.PHONY: all
@@ -33,6 +33,7 @@ check-deps:
$(call check_tool,go)
$(call check_tool,golangci-lint)
$(call check_tool,gofumpt)
$(call check_tool,mdformat)
$(call check_tool,prettier)
$(call check_tool,clang-format)
$(call check_tool,buf)
@@ -52,7 +53,7 @@ test: check-deps $(GO_SOURCES) go.mod go.sum
# Formatting targets
.PHONY: fmt
fmt: fmt-go fmt-prettier fmt-proto
fmt: fmt-go fmt-mdformat fmt-prettier fmt-proto
.PHONY: fmt-go
fmt-go: check-deps $(GO_SOURCES)
@@ -60,11 +61,15 @@ fmt-go: check-deps $(GO_SOURCES)
gofumpt -l -w .
golangci-lint run --fix
.PHONY: fmt-mdformat
fmt-mdformat: check-deps
@echo "Formatting documentation..."
mdformat docs/
.PHONY: fmt-prettier
fmt-prettier: check-deps $(DOC_SOURCES)
@echo "Formatting documentation and config files..."
fmt-prettier: check-deps $(PRETTIER_SOURCES)
@echo "Formatting markup and config files..."
prettier --write '**/*.{ts,js,md,yaml,yml,sass,css,scss,html}'
prettier --write --print-width 80 --prose-wrap always CHANGELOG.md
.PHONY: fmt-proto
fmt-proto: check-deps $(PROTO_SOURCES)
@@ -87,10 +92,9 @@ lint-proto: check-deps $(PROTO_SOURCES)
# Code generation
.PHONY: generate
generate: check-deps $(PROTO_SOURCES)
@echo "Generating code from Protocol Buffers..."
rm -rf gen
buf generate proto
generate: check-deps
@echo "Generating code..."
go generate ./...
# Clean targets
.PHONY: clean
@@ -118,7 +122,8 @@ help:
@echo ""
@echo "Specific targets:"
@echo " fmt-go - Format Go code only"
@echo " fmt-prettier - Format documentation only"
@echo " fmt-mdformat - Format documentation only"
@echo " fmt-prettier - Format markup and config files only"
@echo " fmt-proto - Format Protocol Buffer files only"
@echo " lint-go - Lint Go code only"
@echo " lint-proto - Lint Protocol Buffer files only"
@@ -127,4 +132,4 @@ help:
@echo " check-deps - Verify required tools are available"
@echo ""
@echo "Note: If not running in a nix shell, ensure dependencies are available:"
@echo " nix develop"
@echo " nix develop"

View File

@@ -1,4 +1,4 @@
![headscale logo](./docs/logo/headscale3_header_stacked_left.png)
![headscale logo](./docs/assets/logo/headscale3_header_stacked_left.png)
![ci](https://github.com/juanfont/headscale/actions/workflows/test.yml/badge.svg)
@@ -63,8 +63,18 @@ and container to run Headscale.**
Please have a look at the [`documentation`](https://headscale.net/stable/).
For NixOS users, a module is available in [`nix/`](./nix/).
## Builds from `main`
Development builds from the `main` branch are available as container images and
binaries. See the [development builds](https://headscale.net/stable/setup/install/main/)
documentation for details.
## Talks
- Fosdem 2026 (video): [Headscale & Tailscale: The complementary open source clone](https://fosdem.org/2026/schedule/event/KYQ3LL-headscale-the-complementary-open-source-clone/)
- presented by Kristoffer Dalby
- Fosdem 2023 (video): [Headscale: How we are using integration testing to reimplement Tailscale](https://fosdem.org/2023/schedule/event/goheadscale/)
- presented by Juan Font Alonso and Kristoffer Dalby
@@ -103,6 +113,8 @@ run `make lint` and `make fmt` before committing any code.
The **Proto** code is linted with [`buf`](https://docs.buf.build/lint/overview) and
formatted with [`clang-format`](https://clang.llvm.org/docs/ClangFormat.html).
The **docs** are formatted with [`mdformat`](https://mdformat.readthedocs.io).
The **rest** (Markdown, YAML, etc) is formatted with [`prettier`](https://prettier.io).
Check out the `.golangci.yaml` and `Makefile` to see the specific configuration.
@@ -147,6 +159,7 @@ make build
We recommend using Nix for dependency management to ensure you have all required tools. If you prefer to manage dependencies yourself, you can use Make directly:
**With Nix (recommended):**
```shell
nix develop
make test
@@ -154,6 +167,7 @@ make build
```
**With your own dependencies:**
```shell
make test
make build

View File

@@ -4,15 +4,16 @@ import (
"context"
"fmt"
"strconv"
"time"
v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
"github.com/juanfont/headscale/hscontrol/util"
"github.com/prometheus/common/model"
"github.com/pterm/pterm"
"github.com/rs/zerolog/log"
"github.com/spf13/cobra"
"google.golang.org/protobuf/types/known/timestamppb"
)
const (
// DefaultAPIKeyExpiry is 90 days.
DefaultAPIKeyExpiry = "90d"
)
func init() {
@@ -25,52 +26,35 @@ func init() {
apiKeysCmd.AddCommand(createAPIKeyCmd)
expireAPIKeyCmd.Flags().StringP("prefix", "p", "", "ApiKey prefix")
if err := expireAPIKeyCmd.MarkFlagRequired("prefix"); err != nil {
log.Fatal().Err(err).Msg("")
}
expireAPIKeyCmd.Flags().Uint64P("id", "i", 0, "ApiKey ID")
apiKeysCmd.AddCommand(expireAPIKeyCmd)
deleteAPIKeyCmd.Flags().StringP("prefix", "p", "", "ApiKey prefix")
if err := deleteAPIKeyCmd.MarkFlagRequired("prefix"); err != nil {
log.Fatal().Err(err).Msg("")
}
deleteAPIKeyCmd.Flags().Uint64P("id", "i", 0, "ApiKey ID")
apiKeysCmd.AddCommand(deleteAPIKeyCmd)
}
var apiKeysCmd = &cobra.Command{
Use: "apikeys",
Short: "Handle the API keys in Headscale",
Short: "Handle the Api keys in Headscale",
Aliases: []string{"apikey", "api"},
}
var listAPIKeys = &cobra.Command{
Use: "list",
Short: "List the API keys for Headscale",
Short: "List the Api keys for headscale",
Aliases: []string{"ls", "show"},
Run: func(cmd *cobra.Command, args []string) {
output := GetOutputFlag(cmd)
err := WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
request := &v1.ListApiKeysRequest{}
response, err := client.ListApiKeys(ctx, request)
if err != nil {
ErrorOutput(
err,
fmt.Sprintf("Error getting the list of keys: %s", err),
output,
)
return err
}
if output != "" {
SuccessOutput(response.GetApiKeys(), "", output)
return nil
}
RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
response, err := client.ListApiKeys(ctx, &v1.ListApiKeysRequest{})
if err != nil {
return fmt.Errorf("listing api keys: %w", err)
}
return printListOutput(cmd, response.GetApiKeys(), func() error {
tableData := pterm.TableData{
{"ID", "Prefix", "Expiration", "Created"},
}
for _, key := range response.GetApiKeys() {
expiration := "-"
@@ -84,142 +68,94 @@ var listAPIKeys = &cobra.Command{
expiration,
key.GetCreatedAt().AsTime().Format(HeadscaleDateTimeFormat),
})
}
}
err = pterm.DefaultTable.WithHasHeader().WithData(tableData).Render()
if err != nil {
ErrorOutput(
err,
fmt.Sprintf("Failed to render pterm table: %s", err),
output,
)
return err
}
return nil
return pterm.DefaultTable.WithHasHeader().WithData(tableData).Render()
})
if err != nil {
return
}
},
}),
}
var createAPIKeyCmd = &cobra.Command{
Use: "create",
Short: "Create a new API key",
Short: "Creates a new Api key",
Long: `
Creates a new Api key, the Api key is only visible on creation
and cannot be retrieved again.
If you loose a key, create a new one and revoke (expire) the old one.`,
Aliases: []string{"c", "new"},
Run: func(cmd *cobra.Command, args []string) {
output := GetOutputFlag(cmd)
request := &v1.CreateApiKeyRequest{}
durationStr, _ := cmd.Flags().GetString("expiration")
duration, err := model.ParseDuration(durationStr)
RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
expiration, err := expirationFromFlag(cmd)
if err != nil {
ErrorOutput(
err,
fmt.Sprintf("Could not parse duration: %s\n", err),
output,
)
return
return err
}
expiration := time.Now().UTC().Add(time.Duration(duration))
request.Expiration = timestamppb.New(expiration)
err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
response, err := client.CreateApiKey(ctx, request)
if err != nil {
ErrorOutput(
err,
fmt.Sprintf("Cannot create Api Key: %s\n", err),
output,
)
return err
}
SuccessOutput(response.GetApiKey(), response.GetApiKey(), output)
return nil
response, err := client.CreateApiKey(ctx, &v1.CreateApiKeyRequest{
Expiration: expiration,
})
if err != nil {
return
return fmt.Errorf("creating api key: %w", err)
}
},
return printOutput(cmd, response.GetApiKey(), response.GetApiKey())
}),
}
// apiKeyIDOrPrefix reads --id and --prefix from cmd and validates that
// exactly one is provided.
func apiKeyIDOrPrefix(cmd *cobra.Command) (uint64, string, error) {
id, _ := cmd.Flags().GetUint64("id")
prefix, _ := cmd.Flags().GetString("prefix")
switch {
case id == 0 && prefix == "":
return 0, "", fmt.Errorf("either --id or --prefix must be provided: %w", errMissingParameter)
case id != 0 && prefix != "":
return 0, "", fmt.Errorf("only one of --id or --prefix can be provided: %w", errMissingParameter)
}
return id, prefix, nil
}
var expireAPIKeyCmd = &cobra.Command{
Use: "expire",
Short: "Expire an API key",
Short: "Expire an ApiKey",
Aliases: []string{"revoke", "exp", "e"},
Run: func(cmd *cobra.Command, args []string) {
output := GetOutputFlag(cmd)
prefix, err := cmd.Flags().GetString("prefix")
RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
id, prefix, err := apiKeyIDOrPrefix(cmd)
if err != nil {
ErrorOutput(err, fmt.Sprintf("Error getting prefix from CLI flag: %s", err), output)
return
return err
}
err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
request := &v1.ExpireApiKeyRequest{
Prefix: prefix,
}
response, err := client.ExpireApiKey(ctx, request)
if err != nil {
ErrorOutput(
err,
fmt.Sprintf("Cannot expire Api Key: %s\n", err),
output,
)
return err
}
SuccessOutput(response, "Key expired", output)
return nil
response, err := client.ExpireApiKey(ctx, &v1.ExpireApiKeyRequest{
Id: id,
Prefix: prefix,
})
if err != nil {
return
return fmt.Errorf("expiring api key: %w", err)
}
},
return printOutput(cmd, response, "Key expired")
}),
}
var deleteAPIKeyCmd = &cobra.Command{
Use: "delete",
Short: "Delete an API key",
Short: "Delete an ApiKey",
Aliases: []string{"remove", "del"},
Run: func(cmd *cobra.Command, args []string) {
output := GetOutputFlag(cmd)
prefix, err := cmd.Flags().GetString("prefix")
RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
id, prefix, err := apiKeyIDOrPrefix(cmd)
if err != nil {
ErrorOutput(err, fmt.Sprintf("Error getting prefix from CLI flag: %s", err), output)
return
return err
}
err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
request := &v1.DeleteApiKeyRequest{
Prefix: prefix,
}
response, err := client.DeleteApiKey(ctx, request)
if err != nil {
ErrorOutput(
err,
fmt.Sprintf("Cannot delete Api Key: %s\n", err),
output,
)
return err
}
SuccessOutput(response, "Key deleted", output)
return nil
response, err := client.DeleteApiKey(ctx, &v1.DeleteApiKeyRequest{
Id: id,
Prefix: prefix,
})
if err != nil {
return
return fmt.Errorf("deleting api key: %w", err)
}
},
return printOutput(cmd, response, "Key deleted")
}),
}

93
cmd/headscale/cli/auth.go Normal file
View File

@@ -0,0 +1,93 @@
package cli
import (
"context"
"fmt"
v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
"github.com/spf13/cobra"
)
func init() {
rootCmd.AddCommand(authCmd)
authRegisterCmd.Flags().StringP("user", "u", "", "User")
authRegisterCmd.Flags().String("auth-id", "", "Auth ID")
mustMarkRequired(authRegisterCmd, "user", "auth-id")
authCmd.AddCommand(authRegisterCmd)
authApproveCmd.Flags().String("auth-id", "", "Auth ID")
mustMarkRequired(authApproveCmd, "auth-id")
authCmd.AddCommand(authApproveCmd)
authRejectCmd.Flags().String("auth-id", "", "Auth ID")
mustMarkRequired(authRejectCmd, "auth-id")
authCmd.AddCommand(authRejectCmd)
}
var authCmd = &cobra.Command{
Use: "auth",
Short: "Manage node authentication and approval",
}
var authRegisterCmd = &cobra.Command{
Use: "register",
Short: "Register a node to your network",
RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
user, _ := cmd.Flags().GetString("user")
authID, _ := cmd.Flags().GetString("auth-id")
request := &v1.AuthRegisterRequest{
AuthId: authID,
User: user,
}
response, err := client.AuthRegister(ctx, request)
if err != nil {
return fmt.Errorf("registering node: %w", err)
}
return printOutput(
cmd,
response.GetNode(),
fmt.Sprintf("Node %s registered", response.GetNode().GetGivenName()))
}),
}
var authApproveCmd = &cobra.Command{
Use: "approve",
Short: "Approve a pending authentication request",
RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
authID, _ := cmd.Flags().GetString("auth-id")
request := &v1.AuthApproveRequest{
AuthId: authID,
}
response, err := client.AuthApprove(ctx, request)
if err != nil {
return fmt.Errorf("approving auth request: %w", err)
}
return printOutput(cmd, response, "Auth request approved")
}),
}
var authRejectCmd = &cobra.Command{
Use: "reject",
Short: "Reject a pending authentication request",
RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
authID, _ := cmd.Flags().GetString("auth-id")
request := &v1.AuthRejectRequest{
AuthId: authID,
}
response, err := client.AuthReject(ctx, request)
if err != nil {
return fmt.Errorf("rejecting auth request: %w", err)
}
return printOutput(cmd, response, "Auth request rejected")
}),
}

View File

@@ -1,16 +0,0 @@
package cli
import (
"context"
v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
)
// WithClient handles gRPC client setup and cleanup, calls fn with client and context
func WithClient(fn func(context.Context, v1.HeadscaleServiceClient) error) error {
ctx, client, conn, cancel := newHeadscaleCLIWithConfig()
defer cancel()
defer conn.Close()
return fn(ctx, client)
}

View File

@@ -1,7 +1,8 @@
package cli
import (
"github.com/rs/zerolog/log"
"fmt"
"github.com/spf13/cobra"
)
@@ -11,12 +12,14 @@ func init() {
var configTestCmd = &cobra.Command{
Use: "configtest",
Short: "Test the configuration",
Long: "Run a test of the configuration and exit",
Run: func(cmd *cobra.Command, args []string) {
Short: "Test the configuration.",
Long: "Run a test of the configuration and exit.",
RunE: func(cmd *cobra.Command, args []string) error {
_, err := newHeadscaleServerWithConfig()
if err != nil {
log.Fatal().Caller().Err(err).Msg("Error initializing")
return fmt.Errorf("configuration error: %w", err)
}
return nil
},
}

View File

@@ -1,46 +0,0 @@
package cli
import (
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestConfigTestCommand(t *testing.T) {
// Test that the configtest command exists and is properly configured
assert.NotNil(t, configTestCmd)
assert.Equal(t, "configtest", configTestCmd.Use)
assert.Equal(t, "Test the configuration.", configTestCmd.Short)
assert.Equal(t, "Run a test of the configuration and exit.", configTestCmd.Long)
assert.NotNil(t, configTestCmd.Run)
}
func TestConfigTestCommandInRootCommand(t *testing.T) {
// Test that configtest is available as a subcommand of root
cmd, _, err := rootCmd.Find([]string{"configtest"})
require.NoError(t, err)
assert.Equal(t, "configtest", cmd.Name())
assert.Equal(t, configTestCmd, cmd)
}
func TestConfigTestCommandHelp(t *testing.T) {
// Test that the command has proper help text
assert.NotEmpty(t, configTestCmd.Short)
assert.NotEmpty(t, configTestCmd.Long)
assert.Contains(t, configTestCmd.Short, "configuration")
assert.Contains(t, configTestCmd.Long, "test")
assert.Contains(t, configTestCmd.Long, "configuration")
}
// Note: We can't easily test the actual execution of configtest because:
// 1. It depends on configuration files being present
// 2. It calls log.Fatal() which would exit the test process
// 3. It tries to initialize a full Headscale server
//
// In a real refactor, we would:
// 1. Extract the configuration validation logic to a testable function
// 2. Return errors instead of calling log.Fatal()
// 3. Accept configuration as a parameter instead of loading from global state
//
// For now, we test the command structure and that it's properly wired up.

View File

@@ -6,34 +6,17 @@ import (
v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
"github.com/juanfont/headscale/hscontrol/types"
"github.com/rs/zerolog/log"
"github.com/spf13/cobra"
"google.golang.org/grpc/status"
)
const (
errPreAuthKeyMalformed = Error("key is malformed. expected 64 hex characters with `nodekey` prefix")
)
func init() {
rootCmd.AddCommand(debugCmd)
createNodeCmd.Flags().StringP("name", "", "", "Name")
err := createNodeCmd.MarkFlagRequired("name")
if err != nil {
log.Fatal().Err(err).Msg("")
}
createNodeCmd.Flags().StringP("user", "u", "", "User")
err = createNodeCmd.MarkFlagRequired("user")
if err != nil {
log.Fatal().Err(err).Msg("")
}
createNodeCmd.Flags().StringP("key", "k", "", "Key")
err = createNodeCmd.MarkFlagRequired("key")
if err != nil {
log.Fatal().Err(err).Msg("")
}
mustMarkRequired(createNodeCmd, "name", "user", "key")
createNodeCmd.Flags().
StringSliceP("route", "r", []string{}, "List (or repeated flags) of routes to advertise")
@@ -48,79 +31,31 @@ var debugCmd = &cobra.Command{
var createNodeCmd = &cobra.Command{
Use: "create-node",
Short: "Create a node that can be registered with `nodes register <>` command",
Run: func(cmd *cobra.Command, args []string) {
output := GetOutputFlag(cmd)
Short: "Create a node that can be registered with `auth register <>` command",
RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
user, _ := cmd.Flags().GetString("user")
name, _ := cmd.Flags().GetString("name")
registrationID, _ := cmd.Flags().GetString("key")
user, err := cmd.Flags().GetString("user")
_, err := types.AuthIDFromString(registrationID)
if err != nil {
ErrorOutput(err, fmt.Sprintf("Error getting user: %s", err), output)
return
return fmt.Errorf("parsing machine key: %w", err)
}
name, err := cmd.Flags().GetString("name")
if err != nil {
ErrorOutput(
err,
fmt.Sprintf("Error getting node from flag: %s", err),
output,
)
return
routes, _ := cmd.Flags().GetStringSlice("route")
request := &v1.DebugCreateNodeRequest{
Key: registrationID,
Name: name,
User: user,
Routes: routes,
}
registrationID, err := cmd.Flags().GetString("key")
response, err := client.DebugCreateNode(ctx, request)
if err != nil {
ErrorOutput(
err,
fmt.Sprintf("Error getting key from flag: %s", err),
output,
)
return
return fmt.Errorf("creating node: %w", err)
}
_, err = types.RegistrationIDFromString(registrationID)
if err != nil {
ErrorOutput(
err,
fmt.Sprintf("Failed to parse machine key from flag: %s", err),
output,
)
return
}
routes, err := cmd.Flags().GetStringSlice("route")
if err != nil {
ErrorOutput(
err,
fmt.Sprintf("Error getting routes from flag: %s", err),
output,
)
return
}
err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
request := &v1.DebugCreateNodeRequest{
Key: registrationID,
Name: name,
User: user,
Routes: routes,
}
response, err := client.DebugCreateNode(ctx, request)
if err != nil {
ErrorOutput(
err,
"Cannot create node: "+status.Convert(err).Message(),
output,
)
return err
}
SuccessOutput(response.GetNode(), "Node created", output)
return nil
})
if err != nil {
return
}
},
return printOutput(cmd, response.GetNode(), "Node created")
}),
}

View File

@@ -1,144 +0,0 @@
package cli
import (
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestDebugCommand(t *testing.T) {
// Test that the debug command exists and is properly configured
assert.NotNil(t, debugCmd)
assert.Equal(t, "debug", debugCmd.Use)
assert.Equal(t, "debug and testing commands", debugCmd.Short)
assert.Equal(t, "debug contains extra commands used for debugging and testing headscale", debugCmd.Long)
}
func TestDebugCommandInRootCommand(t *testing.T) {
// Test that debug is available as a subcommand of root
cmd, _, err := rootCmd.Find([]string{"debug"})
require.NoError(t, err)
assert.Equal(t, "debug", cmd.Name())
assert.Equal(t, debugCmd, cmd)
}
func TestCreateNodeCommand(t *testing.T) {
// Test that the create-node command exists and is properly configured
assert.NotNil(t, createNodeCmd)
assert.Equal(t, "create-node", createNodeCmd.Use)
assert.Equal(t, "Create a node that can be registered with `nodes register <>` command", createNodeCmd.Short)
assert.NotNil(t, createNodeCmd.Run)
}
func TestCreateNodeCommandInDebugCommand(t *testing.T) {
// Test that create-node is available as a subcommand of debug
cmd, _, err := rootCmd.Find([]string{"debug", "create-node"})
require.NoError(t, err)
assert.Equal(t, "create-node", cmd.Name())
assert.Equal(t, createNodeCmd, cmd)
}
func TestCreateNodeCommandFlags(t *testing.T) {
// Test that create-node has the required flags
// Test name flag
nameFlag := createNodeCmd.Flags().Lookup("name")
assert.NotNil(t, nameFlag)
assert.Equal(t, "", nameFlag.Shorthand) // No shorthand for name
assert.Equal(t, "", nameFlag.DefValue)
// Test user flag
userFlag := createNodeCmd.Flags().Lookup("user")
assert.NotNil(t, userFlag)
assert.Equal(t, "u", userFlag.Shorthand)
// Test key flag
keyFlag := createNodeCmd.Flags().Lookup("key")
assert.NotNil(t, keyFlag)
assert.Equal(t, "k", keyFlag.Shorthand)
// Test route flag
routeFlag := createNodeCmd.Flags().Lookup("route")
assert.NotNil(t, routeFlag)
assert.Equal(t, "r", routeFlag.Shorthand)
}
func TestCreateNodeCommandRequiredFlags(t *testing.T) {
// Test that required flags are marked as required
// We can't easily test the actual requirement enforcement without executing the command
// But we can test that the flags exist and have the expected properties
// These flags should be required based on the init() function
requiredFlags := []string{"name", "user", "key"}
for _, flagName := range requiredFlags {
flag := createNodeCmd.Flags().Lookup(flagName)
assert.NotNil(t, flag, "Required flag %s should exist", flagName)
}
}
func TestErrorType(t *testing.T) {
// Test the Error type implementation
err := errPreAuthKeyMalformed
assert.Equal(t, "key is malformed. expected 64 hex characters with `nodekey` prefix", err.Error())
assert.Equal(t, "key is malformed. expected 64 hex characters with `nodekey` prefix", string(err))
// Test that it implements the error interface
var genericErr error = err
assert.Equal(t, "key is malformed. expected 64 hex characters with `nodekey` prefix", genericErr.Error())
}
func TestErrorConstants(t *testing.T) {
// Test that error constants are defined properly
assert.Equal(t, Error("key is malformed. expected 64 hex characters with `nodekey` prefix"), errPreAuthKeyMalformed)
}
func TestDebugCommandStructure(t *testing.T) {
// Test that debug has create-node as a subcommand
found := false
for _, subcmd := range debugCmd.Commands() {
if subcmd.Name() == "create-node" {
found = true
break
}
}
assert.True(t, found, "create-node should be a subcommand of debug")
}
func TestCreateNodeCommandHelp(t *testing.T) {
// Test that the command has proper help text
assert.NotEmpty(t, createNodeCmd.Short)
assert.Contains(t, createNodeCmd.Short, "Create a node")
assert.Contains(t, createNodeCmd.Short, "nodes register")
}
func TestCreateNodeCommandFlagDescriptions(t *testing.T) {
// Test that flags have appropriate usage descriptions
nameFlag := createNodeCmd.Flags().Lookup("name")
assert.Equal(t, "Name", nameFlag.Usage)
userFlag := createNodeCmd.Flags().Lookup("user")
assert.Equal(t, "User", userFlag.Usage)
keyFlag := createNodeCmd.Flags().Lookup("key")
assert.Equal(t, "Key", keyFlag.Usage)
routeFlag := createNodeCmd.Flags().Lookup("route")
assert.Contains(t, routeFlag.Usage, "routes to advertise")
}
// Note: We can't easily test the actual execution of create-node because:
// 1. It depends on gRPC client configuration
// 2. It calls SuccessOutput/ErrorOutput which exit the process
// 3. It requires valid registration keys and user setup
//
// In a real refactor, we would:
// 1. Extract the business logic to testable functions
// 2. Use dependency injection for the gRPC client
// 3. Return errors instead of calling ErrorOutput/SuccessOutput
// 4. Add validation functions that can be tested independently
//
// For now, we test the command structure and flag configuration.

View File

@@ -12,18 +12,15 @@ func init() {
}
var dumpConfigCmd = &cobra.Command{
Use: "dump-config",
Short: "Dump current config to /etc/headscale/config.dump.yaml, integration test only",
Aliases: []string{"dumpConfig"},
Hidden: true,
Args: func(cmd *cobra.Command, args []string) error {
return nil
},
Run: func(cmd *cobra.Command, args []string) {
Use: "dumpConfig",
Short: "dump current config to /etc/headscale/config.dump.yaml, integration test only",
Hidden: true,
RunE: func(cmd *cobra.Command, args []string) error {
err := viper.WriteConfigAs("/etc/headscale/config.dump.yaml")
if err != nil {
//nolint
fmt.Println("Failed to dump config")
return fmt.Errorf("dumping config: %w", err)
}
return nil
},
}

View File

@@ -21,22 +21,17 @@ var generateCmd = &cobra.Command{
var generatePrivateKeyCmd = &cobra.Command{
Use: "private-key",
Short: "Generate a private key for the headscale server",
Run: func(cmd *cobra.Command, args []string) {
output := GetOutputFlag(cmd)
RunE: func(cmd *cobra.Command, args []string) error {
machineKey := key.NewMachine()
machineKeyStr, err := machineKey.MarshalText()
if err != nil {
ErrorOutput(
err,
fmt.Sprintf("Error getting machine key from flag: %s", err),
output,
)
return fmt.Errorf("marshalling machine key: %w", err)
}
SuccessOutput(map[string]string{
return printOutput(cmd, map[string]string{
"private_key": string(machineKeyStr),
},
string(machineKeyStr), output)
string(machineKeyStr))
},
}

View File

@@ -1,230 +0,0 @@
package cli
import (
"bytes"
"encoding/json"
"strings"
"testing"
"github.com/spf13/cobra"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"gopkg.in/yaml.v3"
)
func TestGenerateCommand(t *testing.T) {
// Test that the generate command exists and shows help
cmd := &cobra.Command{
Use: "headscale",
Short: "headscale - a Tailscale control server",
}
cmd.AddCommand(generateCmd)
out := new(bytes.Buffer)
cmd.SetOut(out)
cmd.SetErr(out)
cmd.SetArgs([]string{"generate", "--help"})
err := cmd.Execute()
require.NoError(t, err)
outStr := out.String()
assert.Contains(t, outStr, "Generate commands")
assert.Contains(t, outStr, "private-key")
assert.Contains(t, outStr, "Aliases:")
assert.Contains(t, outStr, "gen")
}
func TestGenerateCommandAlias(t *testing.T) {
// Test that the "gen" alias works
cmd := &cobra.Command{
Use: "headscale",
Short: "headscale - a Tailscale control server",
}
cmd.AddCommand(generateCmd)
out := new(bytes.Buffer)
cmd.SetOut(out)
cmd.SetErr(out)
cmd.SetArgs([]string{"gen", "--help"})
err := cmd.Execute()
require.NoError(t, err)
outStr := out.String()
assert.Contains(t, outStr, "Generate commands")
}
func TestGeneratePrivateKeyCommand(t *testing.T) {
tests := []struct {
name string
args []string
expectJSON bool
expectYAML bool
}{
{
name: "default output",
args: []string{"generate", "private-key"},
expectJSON: false,
expectYAML: false,
},
{
name: "json output",
args: []string{"generate", "private-key", "--output", "json"},
expectJSON: true,
expectYAML: false,
},
{
name: "yaml output",
args: []string{"generate", "private-key", "--output", "yaml"},
expectJSON: false,
expectYAML: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Note: This command calls SuccessOutput which exits the process
// We can't test the actual execution easily without mocking
// Instead, we test the command structure and that it exists
cmd := &cobra.Command{
Use: "headscale",
Short: "headscale - a Tailscale control server",
}
cmd.AddCommand(generateCmd)
cmd.PersistentFlags().StringP("output", "o", "", "Output format")
// Test that the command exists and can be found
privateKeyCmd, _, err := cmd.Find([]string{"generate", "private-key"})
require.NoError(t, err)
assert.Equal(t, "private-key", privateKeyCmd.Name())
assert.Equal(t, "Generate a private key for the headscale server", privateKeyCmd.Short)
})
}
}
func TestGeneratePrivateKeyHelp(t *testing.T) {
cmd := &cobra.Command{
Use: "headscale",
Short: "headscale - a Tailscale control server",
}
cmd.AddCommand(generateCmd)
out := new(bytes.Buffer)
cmd.SetOut(out)
cmd.SetErr(out)
cmd.SetArgs([]string{"generate", "private-key", "--help"})
err := cmd.Execute()
require.NoError(t, err)
outStr := out.String()
assert.Contains(t, outStr, "Generate a private key for the headscale server")
assert.Contains(t, outStr, "Usage:")
}
// Test the key generation logic in isolation (without SuccessOutput/ErrorOutput)
func TestPrivateKeyGeneration(t *testing.T) {
// We can't easily test the full command because it calls SuccessOutput which exits
// But we can test that the key generation produces valid output format
// This is testing the core logic that would be in the command
// In a real refactor, we'd extract this to a testable function
// For now, we can test that the command structure is correct
assert.NotNil(t, generatePrivateKeyCmd)
assert.Equal(t, "private-key", generatePrivateKeyCmd.Use)
assert.Equal(t, "Generate a private key for the headscale server", generatePrivateKeyCmd.Short)
assert.NotNil(t, generatePrivateKeyCmd.Run)
}
func TestGenerateCommandStructure(t *testing.T) {
// Test the command hierarchy
assert.Equal(t, "generate", generateCmd.Use)
assert.Equal(t, "Generate commands", generateCmd.Short)
assert.Contains(t, generateCmd.Aliases, "gen")
// Test that private-key is a subcommand
found := false
for _, subcmd := range generateCmd.Commands() {
if subcmd.Name() == "private-key" {
found = true
break
}
}
assert.True(t, found, "private-key should be a subcommand of generate")
}
// Helper function to test output formats (would be used if we refactored the command)
func validatePrivateKeyOutput(t *testing.T, output string, format string) {
switch format {
case "json":
var result map[string]interface{}
err := json.Unmarshal([]byte(output), &result)
require.NoError(t, err, "Output should be valid JSON")
privateKey, exists := result["private_key"]
require.True(t, exists, "JSON should contain private_key field")
keyStr, ok := privateKey.(string)
require.True(t, ok, "private_key should be a string")
require.NotEmpty(t, keyStr, "private_key should not be empty")
// Basic validation that it looks like a machine key
assert.True(t, strings.HasPrefix(keyStr, "mkey:"), "Machine key should start with mkey:")
case "yaml":
var result map[string]interface{}
err := yaml.Unmarshal([]byte(output), &result)
require.NoError(t, err, "Output should be valid YAML")
privateKey, exists := result["private_key"]
require.True(t, exists, "YAML should contain private_key field")
keyStr, ok := privateKey.(string)
require.True(t, ok, "private_key should be a string")
require.NotEmpty(t, keyStr, "private_key should not be empty")
assert.True(t, strings.HasPrefix(keyStr, "mkey:"), "Machine key should start with mkey:")
default:
// Default format should just be the key itself
assert.True(t, strings.HasPrefix(output, "mkey:"), "Default output should be the machine key")
assert.NotContains(t, output, "{", "Default output should not contain JSON")
assert.NotContains(t, output, "private_key:", "Default output should not contain YAML structure")
}
}
func TestPrivateKeyOutputFormats(t *testing.T) {
// Test cases for different output formats
// These test the validation logic we would use after refactoring
tests := []struct {
format string
sample string
}{
{
format: "json",
sample: `{"private_key": "mkey:abcd1234567890abcd1234567890abcd1234567890abcd1234567890abcd1234"}`,
},
{
format: "yaml",
sample: "private_key: mkey:abcd1234567890abcd1234567890abcd1234567890abcd1234567890abcd1234\n",
},
{
format: "",
sample: "mkey:abcd1234567890abcd1234567890abcd1234567890abcd1234567890abcd1234",
},
}
for _, tt := range tests {
t.Run("format_"+tt.format, func(t *testing.T) {
validatePrivateKeyOutput(t, tt.sample, tt.format)
})
}
}

View File

@@ -0,0 +1,27 @@
package cli
import (
"context"
"fmt"
v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
"github.com/spf13/cobra"
)
func init() {
rootCmd.AddCommand(healthCmd)
}
var healthCmd = &cobra.Command{
Use: "health",
Short: "Check the health of the Headscale server",
Long: "Check the health of the Headscale server. This command will return an exit code of 0 if the server is healthy, or 1 if it is not.",
RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
response, err := client.Health(ctx, &v1.HealthRequest{})
if err != nil {
return fmt.Errorf("checking health: %w", err)
}
return printOutput(cmd, response, "")
}),
}

View File

@@ -1,8 +1,8 @@
package cli
import (
"context"
"encoding/json"
"errors"
"fmt"
"net"
"net/http"
@@ -10,6 +10,7 @@ import (
"strconv"
"time"
"github.com/juanfont/headscale/hscontrol/util/zlog/zf"
"github.com/oauth2-proxy/mockoidc"
"github.com/rs/zerolog/log"
"github.com/spf13/cobra"
@@ -24,6 +25,7 @@ const (
errMockOidcClientIDNotDefined = Error("MOCKOIDC_CLIENT_ID not defined")
errMockOidcClientSecretNotDefined = Error("MOCKOIDC_CLIENT_SECRET not defined")
errMockOidcPortNotDefined = Error("MOCKOIDC_PORT not defined")
errMockOidcUsersNotDefined = Error("MOCKOIDC_USERS not defined")
refreshTTL = 60 * time.Minute
)
@@ -37,12 +39,13 @@ var mockOidcCmd = &cobra.Command{
Use: "mockoidc",
Short: "Runs a mock OIDC server for testing",
Long: "This internal command runs a OpenID Connect for testing purposes",
Run: func(cmd *cobra.Command, args []string) {
RunE: func(cmd *cobra.Command, args []string) error {
err := mockOIDC()
if err != nil {
log.Error().Err(err).Msgf("Error running mock OIDC server")
os.Exit(1)
return fmt.Errorf("running mock OIDC server: %w", err)
}
return nil
},
}
@@ -51,41 +54,47 @@ func mockOIDC() error {
if clientID == "" {
return errMockOidcClientIDNotDefined
}
clientSecret := os.Getenv("MOCKOIDC_CLIENT_SECRET")
if clientSecret == "" {
return errMockOidcClientSecretNotDefined
}
addrStr := os.Getenv("MOCKOIDC_ADDR")
if addrStr == "" {
return errMockOidcPortNotDefined
}
portStr := os.Getenv("MOCKOIDC_PORT")
if portStr == "" {
return errMockOidcPortNotDefined
}
accessTTLOverride := os.Getenv("MOCKOIDC_ACCESS_TTL")
if accessTTLOverride != "" {
newTTL, err := time.ParseDuration(accessTTLOverride)
if err != nil {
return err
}
accessTTL = newTTL
}
userStr := os.Getenv("MOCKOIDC_USERS")
if userStr == "" {
return errors.New("MOCKOIDC_USERS not defined")
return errMockOidcUsersNotDefined
}
var users []mockoidc.MockUser
err := json.Unmarshal([]byte(userStr), &users)
if err != nil {
return fmt.Errorf("unmarshalling users: %w", err)
}
log.Info().Interface("users", users).Msg("loading users from JSON")
log.Info().Interface(zf.Users, users).Msg("loading users from JSON")
log.Info().Msgf("Access token TTL: %s", accessTTL)
log.Info().Msgf("access token TTL: %s", accessTTL)
port, err := strconv.Atoi(portStr)
if err != nil {
@@ -97,7 +106,7 @@ func mockOIDC() error {
return err
}
listener, err := net.Listen("tcp", fmt.Sprintf("%s:%d", addrStr, port))
listener, err := new(net.ListenConfig).Listen(context.Background(), "tcp", fmt.Sprintf("%s:%d", addrStr, port))
if err != nil {
return err
}
@@ -106,8 +115,10 @@ func mockOIDC() error {
if err != nil {
return err
}
log.Info().Msgf("Mock OIDC server listening on %s", listener.Addr().String())
log.Info().Msgf("Issuer: %s", mock.Issuer())
log.Info().Msgf("mock OIDC server listening on %s", listener.Addr().String())
log.Info().Msgf("issuer: %s", mock.Issuer())
c := make(chan struct{})
<-c
@@ -138,12 +149,13 @@ func getMockOIDC(clientID string, clientSecret string, users []mockoidc.MockUser
ErrorQueue: &mockoidc.ErrorQueue{},
}
mock.AddMiddleware(func(h http.Handler) http.Handler {
_ = mock.AddMiddleware(func(h http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
log.Info().Msgf("Request: %+v", r)
log.Info().Msgf("request: %+v", r)
h.ServeHTTP(w, r)
if r.Response != nil {
log.Info().Msgf("Response: %+v", r.Response)
log.Info().Msgf("response: %+v", r.Response)
}
})
})

File diff suppressed because it is too large Load Diff

View File

@@ -1,33 +1,54 @@
package cli
import (
"context"
"errors"
"fmt"
"io"
"os"
v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
"github.com/juanfont/headscale/hscontrol/db"
"github.com/juanfont/headscale/hscontrol/policy"
"github.com/juanfont/headscale/hscontrol/types"
"github.com/rs/zerolog/log"
"github.com/spf13/cobra"
"tailscale.com/types/views"
)
const (
bypassFlag = "bypass-grpc-and-access-database-directly" //nolint:gosec // not a credential
)
var errAborted = errors.New("command aborted by user")
// bypassDatabase loads the server config and opens the database directly,
// bypassing the gRPC server. The caller is responsible for closing the
// returned database handle.
func bypassDatabase() (*db.HSDatabase, error) {
cfg, err := types.LoadServerConfig()
if err != nil {
return nil, fmt.Errorf("loading config: %w", err)
}
d, err := db.NewHeadscaleDatabase(cfg, nil)
if err != nil {
return nil, fmt.Errorf("opening database: %w", err)
}
return d, nil
}
func init() {
rootCmd.AddCommand(policyCmd)
getPolicy.Flags().BoolP(bypassFlag, "", false, "Uses the headscale config to directly access the database, bypassing gRPC and does not require the server to be running")
policyCmd.AddCommand(getPolicy)
setPolicy.Flags().StringP("file", "f", "", "Path to a policy file in HuJSON format")
if err := setPolicy.MarkFlagRequired("file"); err != nil {
log.Fatal().Err(err).Msg("")
}
setPolicy.Flags().BoolP(bypassFlag, "", false, "Uses the headscale config to directly access the database, bypassing gRPC and does not require the server to be running")
mustMarkRequired(setPolicy, "file")
policyCmd.AddCommand(setPolicy)
checkPolicy.Flags().StringP("file", "f", "", "Path to a policy file in HuJSON format")
if err := checkPolicy.MarkFlagRequired("file"); err != nil {
log.Fatal().Err(err).Msg("")
}
mustMarkRequired(checkPolicy, "file")
policyCmd.AddCommand(checkPolicy)
}
@@ -40,27 +61,46 @@ var getPolicy = &cobra.Command{
Use: "get",
Short: "Print the current ACL Policy",
Aliases: []string{"show", "view", "fetch"},
Run: func(cmd *cobra.Command, args []string) {
output := GetOutputFlag(cmd)
err := WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
request := &v1.GetPolicyRequest{}
response, err := client.GetPolicy(ctx, request)
if err != nil {
ErrorOutput(err, fmt.Sprintf("Failed loading ACL Policy: %s", err), output)
return err
RunE: func(cmd *cobra.Command, args []string) error {
var policyData string
if bypass, _ := cmd.Flags().GetBool(bypassFlag); bypass {
if !confirmAction(cmd, "DO NOT run this command if an instance of headscale is running, are you sure headscale is not running?") {
return errAborted
}
// TODO(pallabpain): Maybe print this better?
// This does not pass output as we dont support yaml, json or json-line
// output for this command. It is HuJSON already.
SuccessOutput("", response.GetPolicy(), "")
return nil
})
if err != nil {
return
d, err := bypassDatabase()
if err != nil {
return err
}
defer d.Close()
pol, err := d.GetPolicy()
if err != nil {
return fmt.Errorf("loading policy from database: %w", err)
}
policyData = pol.Data
} else {
ctx, client, conn, cancel, err := newHeadscaleCLIWithConfig()
if err != nil {
return fmt.Errorf("connecting to headscale: %w", err)
}
defer cancel()
defer conn.Close()
response, err := client.GetPolicy(ctx, &v1.GetPolicyRequest{})
if err != nil {
return fmt.Errorf("loading ACL policy: %w", err)
}
policyData = response.GetPolicy()
}
// This does not pass output format as we don't support yaml, json or
// json-line output for this command. It is HuJSON already.
fmt.Println(policyData)
return nil
},
}
@@ -71,66 +111,79 @@ var setPolicy = &cobra.Command{
Updates the existing ACL Policy with the provided policy. The policy must be a valid HuJSON object.
This command only works when the acl.policy_mode is set to "db", and the policy will be stored in the database.`,
Aliases: []string{"put", "update"},
Run: func(cmd *cobra.Command, args []string) {
output := GetOutputFlag(cmd)
RunE: func(cmd *cobra.Command, args []string) error {
policyPath, _ := cmd.Flags().GetString("file")
f, err := os.Open(policyPath)
policyBytes, err := os.ReadFile(policyPath)
if err != nil {
ErrorOutput(err, fmt.Sprintf("Error opening the policy file: %s", err), output)
return
}
defer f.Close()
policyBytes, err := io.ReadAll(f)
if err != nil {
ErrorOutput(err, fmt.Sprintf("Error reading the policy file: %s", err), output)
return
return fmt.Errorf("reading policy file: %w", err)
}
request := &v1.SetPolicyRequest{Policy: string(policyBytes)}
err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
if _, err := client.SetPolicy(ctx, request); err != nil {
ErrorOutput(err, fmt.Sprintf("Failed to set ACL Policy: %s", err), output)
return err
if bypass, _ := cmd.Flags().GetBool(bypassFlag); bypass {
if !confirmAction(cmd, "DO NOT run this command if an instance of headscale is running, are you sure headscale is not running?") {
return errAborted
}
SuccessOutput(nil, "Policy updated.", "")
return nil
})
if err != nil {
return
d, err := bypassDatabase()
if err != nil {
return err
}
defer d.Close()
users, err := d.ListUsers()
if err != nil {
return fmt.Errorf("loading users for policy validation: %w", err)
}
_, err = policy.NewPolicyManager(policyBytes, users, views.Slice[types.NodeView]{})
if err != nil {
return fmt.Errorf("parsing policy file: %w", err)
}
_, err = d.SetPolicy(string(policyBytes))
if err != nil {
return fmt.Errorf("setting ACL policy: %w", err)
}
} else {
request := &v1.SetPolicyRequest{Policy: string(policyBytes)}
ctx, client, conn, cancel, err := newHeadscaleCLIWithConfig()
if err != nil {
return fmt.Errorf("connecting to headscale: %w", err)
}
defer cancel()
defer conn.Close()
_, err = client.SetPolicy(ctx, request)
if err != nil {
return fmt.Errorf("setting ACL policy: %w", err)
}
}
fmt.Println("Policy updated.")
return nil
},
}
var checkPolicy = &cobra.Command{
Use: "check",
Short: "Check the Policy file for errors",
Run: func(cmd *cobra.Command, args []string) {
output := GetOutputFlag(cmd)
RunE: func(cmd *cobra.Command, args []string) error {
policyPath, _ := cmd.Flags().GetString("file")
f, err := os.Open(policyPath)
policyBytes, err := os.ReadFile(policyPath)
if err != nil {
ErrorOutput(err, fmt.Sprintf("Error opening the policy file: %s", err), output)
return
}
defer f.Close()
policyBytes, err := io.ReadAll(f)
if err != nil {
ErrorOutput(err, fmt.Sprintf("Error reading the policy file: %s", err), output)
return
return fmt.Errorf("reading policy file: %w", err)
}
_, err = policy.NewPolicyManager(policyBytes, nil, views.Slice[types.NodeView]{})
if err != nil {
ErrorOutput(err, fmt.Sprintf("Error parsing the policy file: %s", err), output)
return
return fmt.Errorf("parsing policy file: %w", err)
}
SuccessOutput(nil, "Policy is valid", "")
fmt.Println("Policy is valid")
return nil
},
}

View File

@@ -5,27 +5,23 @@ import (
"fmt"
"strconv"
"strings"
"time"
v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
"github.com/prometheus/common/model"
"github.com/juanfont/headscale/hscontrol/util"
"github.com/pterm/pterm"
"github.com/rs/zerolog/log"
"github.com/spf13/cobra"
"google.golang.org/protobuf/types/known/timestamppb"
)
const (
DefaultPreAuthKeyExpiry = "1h"
)
func init() {
rootCmd.AddCommand(preauthkeysCmd)
preauthkeysCmd.PersistentFlags().Uint64P("user", "u", 0, "User identifier (ID)")
err := preauthkeysCmd.MarkPersistentFlagRequired("user")
if err != nil {
log.Fatal().Err(err).Msg("")
}
preauthkeysCmd.AddCommand(listPreAuthKeys)
preauthkeysCmd.AddCommand(createPreAuthKeyCmd)
preauthkeysCmd.AddCommand(expirePreAuthKeyCmd)
preauthkeysCmd.AddCommand(deletePreAuthKeyCmd)
createPreAuthKeyCmd.PersistentFlags().
Bool("reusable", false, "Make the preauthkey reusable")
createPreAuthKeyCmd.PersistentFlags().
@@ -34,6 +30,9 @@ func init() {
StringP("expiration", "e", DefaultPreAuthKeyExpiry, "Human-readable expiration of the key (e.g. 30m, 24h)")
createPreAuthKeyCmd.Flags().
StringSlice("tags", []string{}, "Tags to automatically assign to node")
createPreAuthKeyCmd.PersistentFlags().Uint64P("user", "u", 0, "User identifier (ID)")
expirePreAuthKeyCmd.PersistentFlags().Uint64P("id", "i", 0, "Authkey ID")
deletePreAuthKeyCmd.PersistentFlags().Uint64P("id", "i", 0, "Authkey ID")
}
var preauthkeysCmd = &cobra.Command{
@@ -44,196 +43,136 @@ var preauthkeysCmd = &cobra.Command{
var listPreAuthKeys = &cobra.Command{
Use: "list",
Short: "List the preauthkeys for this user",
Short: "List all preauthkeys",
Aliases: []string{"ls", "show"},
Run: func(cmd *cobra.Command, args []string) {
output := GetOutputFlag(cmd)
user, err := cmd.Flags().GetUint64("user")
RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
response, err := client.ListPreAuthKeys(ctx, &v1.ListPreAuthKeysRequest{})
if err != nil {
ErrorOutput(err, fmt.Sprintf("Error getting user: %s", err), output)
return
return fmt.Errorf("listing preauthkeys: %w", err)
}
err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
request := &v1.ListPreAuthKeysRequest{
User: user,
}
response, err := client.ListPreAuthKeys(ctx, request)
if err != nil {
ErrorOutput(
err,
fmt.Sprintf("Error getting the list of keys: %s", err),
output,
)
return err
}
if output != "" {
SuccessOutput(response.GetPreAuthKeys(), "", output)
return nil
}
return printListOutput(cmd, response.GetPreAuthKeys(), func() error {
tableData := pterm.TableData{
{
"ID",
"Key",
"Key/Prefix",
"Reusable",
"Ephemeral",
"Used",
"Expiration",
"Created",
"Tags",
"Owner",
},
}
for _, key := range response.GetPreAuthKeys() {
expiration := "-"
if key.GetExpiration() != nil {
expiration = ColourTime(key.GetExpiration().AsTime())
}
aclTags := ""
for _, tag := range key.GetAclTags() {
aclTags += "," + tag
var owner string
if len(key.GetAclTags()) > 0 {
owner = strings.Join(key.GetAclTags(), "\n")
} else if key.GetUser() != nil {
owner = key.GetUser().GetName()
} else {
owner = "-"
}
aclTags = strings.TrimLeft(aclTags, ",")
tableData = append(tableData, []string{
strconv.FormatUint(key.GetId(), 10),
strconv.FormatUint(key.GetId(), util.Base10),
key.GetKey(),
strconv.FormatBool(key.GetReusable()),
strconv.FormatBool(key.GetEphemeral()),
strconv.FormatBool(key.GetUsed()),
expiration,
key.GetCreatedAt().AsTime().Format(HeadscaleDateTimeFormat),
aclTags,
owner,
})
}
}
err = pterm.DefaultTable.WithHasHeader().WithData(tableData).Render()
if err != nil {
ErrorOutput(
err,
fmt.Sprintf("Failed to render pterm table: %s", err),
output,
)
return err
}
return nil
return pterm.DefaultTable.WithHasHeader().WithData(tableData).Render()
})
if err != nil {
return
}
},
}),
}
var createPreAuthKeyCmd = &cobra.Command{
Use: "create",
Short: "Creates a new preauthkey in the specified user",
Short: "Creates a new preauthkey",
Aliases: []string{"c", "new"},
Run: func(cmd *cobra.Command, args []string) {
output := GetOutputFlag(cmd)
user, err := cmd.Flags().GetUint64("user")
if err != nil {
ErrorOutput(err, fmt.Sprintf("Error getting user: %s", err), output)
return
}
RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
user, _ := cmd.Flags().GetUint64("user")
reusable, _ := cmd.Flags().GetBool("reusable")
ephemeral, _ := cmd.Flags().GetBool("ephemeral")
tags, _ := cmd.Flags().GetStringSlice("tags")
expiration, err := expirationFromFlag(cmd)
if err != nil {
return err
}
request := &v1.CreatePreAuthKeyRequest{
User: user,
Reusable: reusable,
Ephemeral: ephemeral,
AclTags: tags,
User: user,
Reusable: reusable,
Ephemeral: ephemeral,
AclTags: tags,
Expiration: expiration,
}
durationStr, _ := cmd.Flags().GetString("expiration")
duration, err := model.ParseDuration(durationStr)
response, err := client.CreatePreAuthKey(ctx, request)
if err != nil {
ErrorOutput(
err,
fmt.Sprintf("Could not parse duration: %s\n", err),
output,
)
return
return fmt.Errorf("creating preauthkey: %w", err)
}
expiration := time.Now().UTC().Add(time.Duration(duration))
log.Trace().
Dur("expiration", time.Duration(duration)).
Msg("expiration has been set")
request.Expiration = timestamppb.New(expiration)
err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
response, err := client.CreatePreAuthKey(ctx, request)
if err != nil {
ErrorOutput(
err,
fmt.Sprintf("Cannot create Pre Auth Key: %s\n", err),
output,
)
return err
}
SuccessOutput(response.GetPreAuthKey(), response.GetPreAuthKey().GetKey(), output)
return nil
})
if err != nil {
return
}
},
return printOutput(cmd, response.GetPreAuthKey(), response.GetPreAuthKey().GetKey())
}),
}
var expirePreAuthKeyCmd = &cobra.Command{
Use: "expire KEY",
Use: "expire",
Short: "Expire a preauthkey",
Aliases: []string{"revoke", "exp", "e"},
Args: func(cmd *cobra.Command, args []string) error {
if len(args) < 1 {
return errMissingParameter
RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
id, _ := cmd.Flags().GetUint64("id")
if id == 0 {
return fmt.Errorf("missing --id parameter: %w", errMissingParameter)
}
return nil
},
Run: func(cmd *cobra.Command, args []string) {
output := GetOutputFlag(cmd)
user, err := cmd.Flags().GetUint64("user")
request := &v1.ExpirePreAuthKeyRequest{
Id: id,
}
response, err := client.ExpirePreAuthKey(ctx, request)
if err != nil {
ErrorOutput(err, fmt.Sprintf("Error getting user: %s", err), output)
return
return fmt.Errorf("expiring preauthkey: %w", err)
}
err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
request := &v1.ExpirePreAuthKeyRequest{
User: user,
Key: args[0],
}
response, err := client.ExpirePreAuthKey(ctx, request)
if err != nil {
ErrorOutput(
err,
fmt.Sprintf("Cannot expire Pre Auth Key: %s\n", err),
output,
)
return err
}
SuccessOutput(response, "Key expired", output)
return nil
})
if err != nil {
return
}
},
return printOutput(cmd, response, "Key expired")
}),
}
var deletePreAuthKeyCmd = &cobra.Command{
Use: "delete",
Short: "Delete a preauthkey",
Aliases: []string{"del", "rm", "d"},
RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
id, _ := cmd.Flags().GetUint64("id")
if id == 0 {
return fmt.Errorf("missing --id parameter: %w", errMissingParameter)
}
request := &v1.DeletePreAuthKeyRequest{
Id: id,
}
response, err := client.DeletePreAuthKey(ctx, request)
if err != nil {
return fmt.Errorf("deleting preauthkey: %w", err)
}
return printOutput(cmd, response, "Key deleted")
}),
}

View File

@@ -1,10 +1,10 @@
package cli
import (
"fmt"
"os"
"runtime"
"slices"
"strings"
"github.com/juanfont/headscale/hscontrol/types"
"github.com/rs/zerolog"
@@ -34,25 +34,34 @@ func init() {
StringP("output", "o", "", "Output format. Empty for human-readable, 'json', 'json-line' or 'yaml'")
rootCmd.PersistentFlags().
Bool("force", false, "Disable prompts and forces the execution")
// Re-enable usage output only for flag-parsing errors; runtime errors
// from RunE should never dump usage text.
rootCmd.SetFlagErrorFunc(func(cmd *cobra.Command, err error) error {
cmd.SilenceUsage = false
return err
})
}
func initConfig() {
if cfgFile == "" {
cfgFile = os.Getenv("HEADSCALE_CONFIG")
}
if cfgFile != "" {
err := types.LoadConfig(cfgFile, true)
if err != nil {
log.Fatal().Caller().Err(err).Msgf("Error loading config file %s", cfgFile)
log.Fatal().Caller().Err(err).Msgf("error loading config file %s", cfgFile)
}
} else {
err := types.LoadConfig("", false)
if err != nil {
log.Fatal().Caller().Err(err).Msgf("Error loading config")
log.Fatal().Caller().Err(err).Msgf("error loading config")
}
}
machineOutput := HasMachineOutputFlag()
machineOutput := hasMachineOutputFlag()
// If the user has requested a "node" readable format,
// then disable login so the output remains valid.
@@ -67,25 +76,66 @@ func initConfig() {
disableUpdateCheck := viper.GetBool("disable_check_updates")
if !disableUpdateCheck && !machineOutput {
versionInfo := types.GetVersionInfo()
if (runtime.GOOS == "linux" || runtime.GOOS == "darwin") &&
types.Version != "dev" {
!versionInfo.Dirty {
githubTag := &latest.GithubTag{
Owner: "juanfont",
Repository: "headscale",
Owner: "juanfont",
Repository: "headscale",
TagFilterFunc: filterPreReleasesIfStable(func() string { return versionInfo.Version }),
}
res, err := latest.Check(githubTag, types.Version)
res, err := latest.Check(githubTag, versionInfo.Version)
if err == nil && res.Outdated {
//nolint
log.Warn().Msgf(
"An updated version of Headscale has been found (%s vs. your current %s). Check it out https://github.com/juanfont/headscale/releases\n",
res.Current,
types.Version,
versionInfo.Version,
)
}
}
}
}
var prereleases = []string{"alpha", "beta", "rc", "dev"}
func isPreReleaseVersion(version string) bool {
for _, unstable := range prereleases {
if strings.Contains(version, unstable) {
return true
}
}
return false
}
// filterPreReleasesIfStable returns a function that filters out
// pre-release tags if the current version is stable.
// If the current version is a pre-release, it does not filter anything.
// versionFunc is a function that returns the current version string, it is
// a func for testability.
func filterPreReleasesIfStable(versionFunc func() string) func(string) bool {
return func(tag string) bool {
version := versionFunc()
// If we are on a pre-release version, then we do not filter anything
// as we want to recommend the user the latest pre-release.
if isPreReleaseVersion(version) {
return false
}
// If we are on a stable release, filter out pre-releases.
for _, ignore := range prereleases {
if strings.Contains(tag, ignore) {
return true
}
}
return false
}
}
var rootCmd = &cobra.Command{
Use: "headscale",
Short: "headscale - a Tailscale control server",
@@ -93,11 +143,15 @@ var rootCmd = &cobra.Command{
headscale is an open source implementation of the Tailscale control server
https://github.com/juanfont/headscale`,
SilenceErrors: true,
SilenceUsage: true,
}
func Execute() {
if err := rootCmd.Execute(); err != nil {
fmt.Fprintln(os.Stderr, err)
cmd, err := rootCmd.ExecuteC()
if err != nil {
outputFormat, _ := cmd.Flags().GetString("output")
printError(err, outputFormat)
os.Exit(1)
}
}

View File

@@ -0,0 +1,293 @@
package cli
import (
"testing"
)
func TestFilterPreReleasesIfStable(t *testing.T) {
tests := []struct {
name string
currentVersion string
tag string
expectedFilter bool
description string
}{
{
name: "stable version filters alpha tag",
currentVersion: "0.23.0",
tag: "v0.24.0-alpha.1",
expectedFilter: true,
description: "When on stable release, alpha tags should be filtered",
},
{
name: "stable version filters beta tag",
currentVersion: "0.23.0",
tag: "v0.24.0-beta.2",
expectedFilter: true,
description: "When on stable release, beta tags should be filtered",
},
{
name: "stable version filters rc tag",
currentVersion: "0.23.0",
tag: "v0.24.0-rc.1",
expectedFilter: true,
description: "When on stable release, rc tags should be filtered",
},
{
name: "stable version allows stable tag",
currentVersion: "0.23.0",
tag: "v0.24.0",
expectedFilter: false,
description: "When on stable release, stable tags should not be filtered",
},
{
name: "alpha version allows alpha tag",
currentVersion: "0.23.0-alpha.1",
tag: "v0.24.0-alpha.2",
expectedFilter: false,
description: "When on alpha release, alpha tags should not be filtered",
},
{
name: "alpha version allows beta tag",
currentVersion: "0.23.0-alpha.1",
tag: "v0.24.0-beta.1",
expectedFilter: false,
description: "When on alpha release, beta tags should not be filtered",
},
{
name: "alpha version allows rc tag",
currentVersion: "0.23.0-alpha.1",
tag: "v0.24.0-rc.1",
expectedFilter: false,
description: "When on alpha release, rc tags should not be filtered",
},
{
name: "alpha version allows stable tag",
currentVersion: "0.23.0-alpha.1",
tag: "v0.24.0",
expectedFilter: false,
description: "When on alpha release, stable tags should not be filtered",
},
{
name: "beta version allows alpha tag",
currentVersion: "0.23.0-beta.1",
tag: "v0.24.0-alpha.1",
expectedFilter: false,
description: "When on beta release, alpha tags should not be filtered",
},
{
name: "beta version allows beta tag",
currentVersion: "0.23.0-beta.2",
tag: "v0.24.0-beta.3",
expectedFilter: false,
description: "When on beta release, beta tags should not be filtered",
},
{
name: "beta version allows rc tag",
currentVersion: "0.23.0-beta.1",
tag: "v0.24.0-rc.1",
expectedFilter: false,
description: "When on beta release, rc tags should not be filtered",
},
{
name: "beta version allows stable tag",
currentVersion: "0.23.0-beta.1",
tag: "v0.24.0",
expectedFilter: false,
description: "When on beta release, stable tags should not be filtered",
},
{
name: "rc version allows alpha tag",
currentVersion: "0.23.0-rc.1",
tag: "v0.24.0-alpha.1",
expectedFilter: false,
description: "When on rc release, alpha tags should not be filtered",
},
{
name: "rc version allows beta tag",
currentVersion: "0.23.0-rc.1",
tag: "v0.24.0-beta.1",
expectedFilter: false,
description: "When on rc release, beta tags should not be filtered",
},
{
name: "rc version allows rc tag",
currentVersion: "0.23.0-rc.2",
tag: "v0.24.0-rc.3",
expectedFilter: false,
description: "When on rc release, rc tags should not be filtered",
},
{
name: "rc version allows stable tag",
currentVersion: "0.23.0-rc.1",
tag: "v0.24.0",
expectedFilter: false,
description: "When on rc release, stable tags should not be filtered",
},
{
name: "stable version with patch filters alpha",
currentVersion: "0.23.1",
tag: "v0.24.0-alpha.1",
expectedFilter: true,
description: "Stable version with patch number should filter alpha tags",
},
{
name: "stable version with patch allows stable",
currentVersion: "0.23.1",
tag: "v0.24.0",
expectedFilter: false,
description: "Stable version with patch number should allow stable tags",
},
{
name: "tag with alpha substring in version number",
currentVersion: "0.23.0",
tag: "v1.0.0-alpha.1",
expectedFilter: true,
description: "Tags with alpha in version string should be filtered on stable",
},
{
name: "tag with beta substring in version number",
currentVersion: "0.23.0",
tag: "v1.0.0-beta.1",
expectedFilter: true,
description: "Tags with beta in version string should be filtered on stable",
},
{
name: "tag with rc substring in version number",
currentVersion: "0.23.0",
tag: "v1.0.0-rc.1",
expectedFilter: true,
description: "Tags with rc in version string should be filtered on stable",
},
{
name: "empty tag on stable version",
currentVersion: "0.23.0",
tag: "",
expectedFilter: false,
description: "Empty tags should not be filtered",
},
{
name: "dev version allows all tags",
currentVersion: "0.23.0-dev",
tag: "v0.24.0-alpha.1",
expectedFilter: false,
description: "Dev versions should not filter any tags (pre-release allows all)",
},
{
name: "stable version filters dev tag",
currentVersion: "0.23.0",
tag: "v0.24.0-dev",
expectedFilter: true,
description: "When on stable release, dev tags should be filtered",
},
{
name: "dev version allows dev tag",
currentVersion: "0.23.0-dev",
tag: "v0.24.0-dev.1",
expectedFilter: false,
description: "When on dev release, dev tags should not be filtered",
},
{
name: "dev version allows stable tag",
currentVersion: "0.23.0-dev",
tag: "v0.24.0",
expectedFilter: false,
description: "When on dev release, stable tags should not be filtered",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result := filterPreReleasesIfStable(func() string { return tt.currentVersion })(tt.tag)
if result != tt.expectedFilter {
t.Errorf("%s: got %v, want %v\nDescription: %s\nCurrent version: %s, Tag: %s",
tt.name,
result,
tt.expectedFilter,
tt.description,
tt.currentVersion,
tt.tag,
)
}
})
}
}
func TestIsPreReleaseVersion(t *testing.T) {
tests := []struct {
name string
version string
expected bool
description string
}{
{
name: "stable version",
version: "0.23.0",
expected: false,
description: "Stable version should not be pre-release",
},
{
name: "alpha version",
version: "0.23.0-alpha.1",
expected: true,
description: "Alpha version should be pre-release",
},
{
name: "beta version",
version: "0.23.0-beta.1",
expected: true,
description: "Beta version should be pre-release",
},
{
name: "rc version",
version: "0.23.0-rc.1",
expected: true,
description: "RC version should be pre-release",
},
{
name: "version with alpha substring",
version: "0.23.0-alphabetical",
expected: true,
description: "Version containing 'alpha' should be pre-release",
},
{
name: "version with beta substring",
version: "0.23.0-betamax",
expected: true,
description: "Version containing 'beta' should be pre-release",
},
{
name: "dev version",
version: "0.23.0-dev",
expected: true,
description: "Dev version should be pre-release",
},
{
name: "empty version",
version: "",
expected: false,
description: "Empty version should not be pre-release",
},
{
name: "version with patch number",
version: "0.23.1",
expected: false,
description: "Stable version with patch should not be pre-release",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result := isPreReleaseVersion(tt.version)
if result != tt.expected {
t.Errorf("%s: got %v, want %v\nDescription: %s\nVersion: %s",
tt.name,
result,
tt.expected,
tt.description,
tt.version,
)
}
})
}
}

View File

@@ -5,7 +5,6 @@ import (
"fmt"
"net/http"
"github.com/rs/zerolog/log"
"github.com/spf13/cobra"
"github.com/tailscale/squibble"
)
@@ -17,24 +16,22 @@ func init() {
var serveCmd = &cobra.Command{
Use: "serve",
Short: "Launches the headscale server",
Args: func(cmd *cobra.Command, args []string) error {
return nil
},
Run: func(cmd *cobra.Command, args []string) {
RunE: func(cmd *cobra.Command, args []string) error {
app, err := newHeadscaleServerWithConfig()
if err != nil {
var squibbleErr squibble.ValidationError
if errors.As(err, &squibbleErr) {
if squibbleErr, ok := errors.AsType[squibble.ValidationError](err); ok {
fmt.Printf("SQLite schema failed to validate:\n")
fmt.Println(squibbleErr.Diff)
}
log.Fatal().Caller().Err(err).Msg("Error initializing")
return fmt.Errorf("initializing: %w", err)
}
err = app.Serve()
if err != nil && !errors.Is(err, http.ErrServerClosed) {
log.Fatal().Caller().Err(err).Msg("Headscale ran into an error and had to shut down.")
return fmt.Errorf("headscale ran into an error and had to shut down: %w", err)
}
return nil
},
}

View File

@@ -1,70 +0,0 @@
package cli
import (
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestServeCommand(t *testing.T) {
// Test that the serve command exists and is properly configured
assert.NotNil(t, serveCmd)
assert.Equal(t, "serve", serveCmd.Use)
assert.Equal(t, "Launches the headscale server", serveCmd.Short)
assert.NotNil(t, serveCmd.Run)
assert.NotNil(t, serveCmd.Args)
}
func TestServeCommandInRootCommand(t *testing.T) {
// Test that serve is available as a subcommand of root
cmd, _, err := rootCmd.Find([]string{"serve"})
require.NoError(t, err)
assert.Equal(t, "serve", cmd.Name())
assert.Equal(t, serveCmd, cmd)
}
func TestServeCommandArgs(t *testing.T) {
// Test that the Args function is defined and accepts any arguments
// The current implementation always returns nil (accepts any args)
assert.NotNil(t, serveCmd.Args)
// Test the args function directly
err := serveCmd.Args(serveCmd, []string{})
assert.NoError(t, err, "Args function should accept empty arguments")
err = serveCmd.Args(serveCmd, []string{"extra", "args"})
assert.NoError(t, err, "Args function should accept extra arguments")
}
func TestServeCommandHelp(t *testing.T) {
// Test that the command has proper help text
assert.NotEmpty(t, serveCmd.Short)
assert.Contains(t, serveCmd.Short, "server")
assert.Contains(t, serveCmd.Short, "headscale")
}
func TestServeCommandStructure(t *testing.T) {
// Test basic command structure
assert.Equal(t, "serve", serveCmd.Name())
assert.Equal(t, "Launches the headscale server", serveCmd.Short)
// Test that it has no subcommands (it's a leaf command)
subcommands := serveCmd.Commands()
assert.Empty(t, subcommands, "Serve command should not have subcommands")
}
// Note: We can't easily test the actual execution of serve because:
// 1. It depends on configuration files being present and valid
// 2. It calls log.Fatal() which would exit the test process
// 3. It tries to start an actual HTTP server which would block forever
// 4. It requires database connections and other infrastructure
//
// In a real refactor, we would:
// 1. Extract server initialization logic to a testable function
// 2. Use dependency injection for configuration and dependencies
// 3. Return errors instead of calling log.Fatal()
// 4. Add graceful shutdown capabilities for testing
// 5. Allow server startup to be cancelled via context
//
// For now, we test the command structure and basic properties.

View File

@@ -1,55 +0,0 @@
package cli
import (
"strings"
"github.com/pterm/pterm"
"github.com/spf13/cobra"
)
const (
HeadscaleDateTimeFormat = "2006-01-02 15:04:05"
DefaultAPIKeyExpiry = "90d"
DefaultPreAuthKeyExpiry = "1h"
)
// FilterTableColumns filters table columns based on --columns flag
func FilterTableColumns(cmd *cobra.Command, tableData pterm.TableData) pterm.TableData {
columns, _ := cmd.Flags().GetString("columns")
if columns == "" || len(tableData) == 0 {
return tableData
}
headers := tableData[0]
wantedColumns := strings.Split(columns, ",")
// Find column indices
var indices []int
for _, wanted := range wantedColumns {
wanted = strings.TrimSpace(wanted)
for i, header := range headers {
if strings.EqualFold(header, wanted) {
indices = append(indices, i)
break
}
}
}
if len(indices) == 0 {
return tableData
}
// Filter all rows
filtered := make(pterm.TableData, len(tableData))
for i, row := range tableData {
newRow := make([]string, len(indices))
for j, idx := range indices {
if idx < len(row) {
newRow[j] = row[idx]
}
}
filtered[i] = newRow
}
return filtered
}

View File

@@ -6,34 +6,42 @@ import (
"fmt"
"net/url"
"strconv"
"strings"
survey "github.com/AlecAivazis/survey/v2"
v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
"github.com/juanfont/headscale/hscontrol/util"
"github.com/juanfont/headscale/hscontrol/util/zlog/zf"
"github.com/pterm/pterm"
"github.com/rs/zerolog/log"
"github.com/spf13/cobra"
"google.golang.org/grpc/status"
)
// CLI user errors.
var (
errFlagRequired = errors.New("--name or --identifier flag is required")
errMultipleUsersMatch = errors.New("multiple users match query, specify an ID")
)
func usernameAndIDFlag(cmd *cobra.Command) {
cmd.Flags().StringP("user", "u", "", "User identifier (ID, name, or email)")
cmd.Flags().Int64P("identifier", "i", -1, "User identifier (ID)")
cmd.Flags().StringP("name", "n", "", "Username")
}
// userIDFromFlag returns the user ID using smart lookup.
// If no user is specified, it will exit the program with an error.
func userIDFromFlag(cmd *cobra.Command) uint64 {
userID, err := GetUserIdentifier(cmd)
if err != nil {
ErrorOutput(
err,
"Cannot identify user: "+err.Error(),
GetOutputFlag(cmd),
)
// usernameAndIDFromFlag returns the username and ID from the flags of the command.
func usernameAndIDFromFlag(cmd *cobra.Command) (uint64, string, error) {
username, _ := cmd.Flags().GetString("name")
identifier, _ := cmd.Flags().GetInt64("identifier")
if username == "" && identifier < 0 {
return 0, "", errFlagRequired
}
return userID
// Normalise unset/negative identifiers to 0 so the uint64
// conversion does not produce a bogus large value.
if identifier < 0 {
identifier = 0
}
return uint64(identifier), username, nil //nolint:gosec // identifier is clamped to >= 0 above
}
func init() {
@@ -43,26 +51,20 @@ func init() {
createUserCmd.Flags().StringP("email", "e", "", "Email")
createUserCmd.Flags().StringP("picture-url", "p", "", "Profile picture URL")
userCmd.AddCommand(listUsersCmd)
// Smart lookup filters - can be used individually or combined
listUsersCmd.Flags().StringP("user", "u", "", "Filter by user (ID, name, or email)")
listUsersCmd.Flags().Uint64P("id", "", 0, "Filter by user ID")
listUsersCmd.Flags().StringP("name", "n", "", "Filter by username")
listUsersCmd.Flags().StringP("email", "e", "", "Filter by email address")
listUsersCmd.Flags().String("columns", "", "Comma-separated list of columns to display (ID,Name,Username,Email,Created)")
usernameAndIDFlag(listUsersCmd)
listUsersCmd.Flags().StringP("email", "e", "", "Email")
userCmd.AddCommand(destroyUserCmd)
usernameAndIDFlag(destroyUserCmd)
userCmd.AddCommand(renameUserCmd)
usernameAndIDFlag(renameUserCmd)
renameUserCmd.Flags().StringP("new-name", "r", "", "New username")
renameUserCmd.MarkFlagRequired("new-name")
mustMarkRequired(renameUserCmd, "new-name")
}
var errMissingParameter = errors.New("missing parameters")
var userCmd = &cobra.Command{
Use: "users",
Short: "Manage the users of Headscale",
Aliases: []string{"user", "namespace", "namespaces", "ns"},
Aliases: []string{"user"},
}
var createUserCmd = &cobra.Command{
@@ -76,10 +78,11 @@ var createUserCmd = &cobra.Command{
return nil
},
Run: func(cmd *cobra.Command, args []string) {
output := GetOutputFlag(cmd)
RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
userName := args[0]
log.Trace().Interface(zf.Client, client).Msg("obtained gRPC client")
request := &v1.CreateUserRequest{Name: userName}
if displayName, _ := cmd.Flags().GetString("display-name"); displayName != "" {
@@ -91,177 +94,101 @@ var createUserCmd = &cobra.Command{
}
if pictureURL, _ := cmd.Flags().GetString("picture-url"); pictureURL != "" {
if _, err := url.Parse(pictureURL); err != nil {
ErrorOutput(
err,
fmt.Sprintf(
"Invalid Picture URL: %s",
err,
),
output,
)
return
if _, err := url.Parse(pictureURL); err != nil { //nolint:noinlineerr
return fmt.Errorf("invalid picture URL: %w", err)
}
request.PictureUrl = pictureURL
}
err := WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
log.Trace().Interface("client", client).Msg("Obtained gRPC client")
log.Trace().Interface("request", request).Msg("Sending CreateUser request")
log.Trace().Interface(zf.Request, request).Msg("sending CreateUser request")
response, err := client.CreateUser(ctx, request)
if err != nil {
ErrorOutput(
err,
"Cannot create user: "+status.Convert(err).Message(),
output,
)
return err
}
SuccessOutput(response.GetUser(), "User created", output)
return nil
})
response, err := client.CreateUser(ctx, request)
if err != nil {
return
return fmt.Errorf("creating user: %w", err)
}
},
return printOutput(cmd, response.GetUser(), "User created")
}),
}
var destroyUserCmd = &cobra.Command{
Use: "destroy --user USER",
Use: "destroy --identifier ID or --name NAME",
Short: "Destroys a user",
Aliases: []string{"delete"},
Run: func(cmd *cobra.Command, args []string) {
output := GetOutputFlag(cmd)
id := userIDFromFlag(cmd)
request := &v1.ListUsersRequest{
Id: id,
}
var user *v1.User
err := WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
users, err := client.ListUsers(ctx, request)
if err != nil {
ErrorOutput(
err,
"Error: "+status.Convert(err).Message(),
output,
)
return err
}
if len(users.GetUsers()) != 1 {
err := errors.New("Unable to determine user to delete, query returned multiple users, use ID")
ErrorOutput(
err,
"Error: "+status.Convert(err).Message(),
output,
)
return err
}
user = users.GetUsers()[0]
return nil
})
RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
id, username, err := usernameAndIDFromFlag(cmd)
if err != nil {
return
return err
}
confirm := false
force, _ := cmd.Flags().GetBool("force")
if !force {
prompt := &survey.Confirm{
Message: fmt.Sprintf(
"Do you want to remove the user %q (%d) and any associated preauthkeys?",
user.GetName(), user.GetId(),
),
}
err := survey.AskOne(prompt, &confirm)
if err != nil {
return
}
request := &v1.ListUsersRequest{
Name: username,
Id: id,
}
if confirm || force {
err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
request := &v1.DeleteUserRequest{Id: user.GetId()}
response, err := client.DeleteUser(ctx, request)
if err != nil {
ErrorOutput(
err,
"Cannot destroy user: "+status.Convert(err).Message(),
output,
)
return err
}
SuccessOutput(response, "User destroyed", output)
return nil
})
if err != nil {
return
}
} else {
SuccessOutput(map[string]string{"Result": "User not destroyed"}, "User not destroyed", output)
users, err := client.ListUsers(ctx, request)
if err != nil {
return fmt.Errorf("listing users: %w", err)
}
},
if len(users.GetUsers()) != 1 {
return errMultipleUsersMatch
}
user := users.GetUsers()[0]
if !confirmAction(cmd, fmt.Sprintf(
"Do you want to remove the user %q (%d) and any associated preauthkeys?",
user.GetName(), user.GetId(),
)) {
return printOutput(cmd, map[string]string{"Result": "User not destroyed"}, "User not destroyed")
}
deleteRequest := &v1.DeleteUserRequest{Id: user.GetId()}
response, err := client.DeleteUser(ctx, deleteRequest)
if err != nil {
return fmt.Errorf("destroying user: %w", err)
}
return printOutput(cmd, response, "User destroyed")
}),
}
var listUsersCmd = &cobra.Command{
Use: "list",
Short: "List all the users",
Aliases: []string{"ls", "show"},
Run: func(cmd *cobra.Command, args []string) {
output := GetOutputFlag(cmd)
RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
request := &v1.ListUsersRequest{}
err := WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
request := &v1.ListUsersRequest{}
id, _ := cmd.Flags().GetInt64("identifier")
username, _ := cmd.Flags().GetString("name")
email, _ := cmd.Flags().GetString("email")
// Check for smart lookup flag first
userFlag, _ := cmd.Flags().GetString("user")
if userFlag != "" {
// Use smart lookup to determine filter type
if id, err := strconv.ParseUint(userFlag, 10, 64); err == nil && id > 0 {
request.Id = id
} else if strings.Contains(userFlag, "@") {
request.Email = userFlag
} else {
request.Name = userFlag
}
} else {
// Check specific filter flags
if id, _ := cmd.Flags().GetUint64("id"); id > 0 {
request.Id = id
} else if name, _ := cmd.Flags().GetString("name"); name != "" {
request.Name = name
} else if email, _ := cmd.Flags().GetString("email"); email != "" {
request.Email = email
}
}
// filter by one param at most
switch {
case id > 0:
request.Id = uint64(id)
case username != "":
request.Name = username
case email != "":
request.Email = email
}
response, err := client.ListUsers(ctx, request)
if err != nil {
ErrorOutput(
err,
"Cannot get users: "+status.Convert(err).Message(),
output,
)
return err
}
if output != "" {
SuccessOutput(response.GetUsers(), "", output)
return nil
}
response, err := client.ListUsers(ctx, request)
if err != nil {
return fmt.Errorf("listing users: %w", err)
}
return printListOutput(cmd, response.GetUsers(), func() error {
tableData := pterm.TableData{{"ID", "Name", "Username", "Email", "Created"}}
for _, user := range response.GetUsers() {
tableData = append(
tableData,
[]string{
strconv.FormatUint(user.GetId(), 10),
strconv.FormatUint(user.GetId(), util.Base10),
user.GetDisplayName(),
user.GetName(),
user.GetEmail(),
@@ -269,80 +196,48 @@ var listUsersCmd = &cobra.Command{
},
)
}
tableData = FilterTableColumns(cmd, tableData)
err = pterm.DefaultTable.WithHasHeader().WithData(tableData).Render()
if err != nil {
ErrorOutput(
err,
fmt.Sprintf("Failed to render pterm table: %s", err),
output,
)
return err
}
return nil
return pterm.DefaultTable.WithHasHeader().WithData(tableData).Render()
})
if err != nil {
// Error already handled in closure
return
}
},
}),
}
var renameUserCmd = &cobra.Command{
Use: "rename",
Short: "Renames a user",
Aliases: []string{"mv"},
Run: func(cmd *cobra.Command, args []string) {
output := GetOutputFlag(cmd)
RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
id, username, err := usernameAndIDFromFlag(cmd)
if err != nil {
return err
}
listReq := &v1.ListUsersRequest{
Name: username,
Id: id,
}
users, err := client.ListUsers(ctx, listReq)
if err != nil {
return fmt.Errorf("listing users: %w", err)
}
if len(users.GetUsers()) != 1 {
return errMultipleUsersMatch
}
id := userIDFromFlag(cmd)
newName, _ := cmd.Flags().GetString("new-name")
err := WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
listReq := &v1.ListUsersRequest{
Id: id,
}
users, err := client.ListUsers(ctx, listReq)
if err != nil {
ErrorOutput(
err,
"Error: "+status.Convert(err).Message(),
output,
)
return err
}
if len(users.GetUsers()) != 1 {
err := errors.New("Unable to determine user to delete, query returned multiple users, use ID")
ErrorOutput(
err,
"Error: "+status.Convert(err).Message(),
output,
)
return err
}
renameReq := &v1.RenameUserRequest{
OldId: id,
NewName: newName,
}
response, err := client.RenameUser(ctx, renameReq)
if err != nil {
ErrorOutput(
err,
"Cannot rename user: "+status.Convert(err).Message(),
output,
)
return err
}
SuccessOutput(response.GetUser(), "User renamed", output)
return nil
})
if err != nil {
return
renameReq := &v1.RenameUserRequest{
OldId: id,
NewName: newName,
}
},
response, err := client.RenameUser(ctx, renameReq)
if err != nil {
return fmt.Errorf("renaming user: %w", err)
}
return printOutput(cmd, response.GetUser(), "User renamed")
}),
}

View File

@@ -4,24 +4,52 @@ import (
"context"
"crypto/tls"
"encoding/json"
"errors"
"fmt"
"net"
"os"
"strconv"
"strings"
"time"
v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
"github.com/juanfont/headscale/hscontrol"
"github.com/juanfont/headscale/hscontrol/types"
"github.com/juanfont/headscale/hscontrol/util"
"github.com/juanfont/headscale/hscontrol/util/zlog/zf"
"github.com/prometheus/common/model"
"github.com/rs/zerolog/log"
"github.com/spf13/cobra"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials"
"google.golang.org/grpc/credentials/insecure"
"google.golang.org/protobuf/types/known/timestamppb"
"gopkg.in/yaml.v3"
)
const (
HeadscaleDateTimeFormat = "2006-01-02 15:04:05"
SocketWritePermissions = 0o666
outputFormatJSON = "json"
outputFormatJSONLine = "json-line"
outputFormatYAML = "yaml"
)
var (
errAPIKeyNotSet = errors.New("HEADSCALE_CLI_API_KEY environment variable needs to be set")
errMissingParameter = errors.New("missing parameters")
)
// mustMarkRequired marks the named flags as required on cmd, panicking
// if any name does not match a registered flag. This is only called
// from init() where a failure indicates a programming error.
func mustMarkRequired(cmd *cobra.Command, names ...string) {
for _, n := range names {
err := cmd.MarkFlagRequired(n)
if err != nil {
panic(fmt.Sprintf("marking flag %q required on %q: %v", n, cmd.Name(), err))
}
}
}
func newHeadscaleServerWithConfig() (*hscontrol.Headscale, error) {
cfg, err := types.LoadServerConfig()
if err != nil {
@@ -39,14 +67,28 @@ func newHeadscaleServerWithConfig() (*hscontrol.Headscale, error) {
return app, nil
}
func newHeadscaleCLIWithConfig() (context.Context, v1.HeadscaleServiceClient, *grpc.ClientConn, context.CancelFunc) {
// grpcRunE wraps a cobra RunE func, injecting a ready gRPC client and
// context. Connection lifecycle is managed by the wrapper — callers
// never see the underlying conn or cancel func.
func grpcRunE(
fn func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error,
) func(*cobra.Command, []string) error {
return func(cmd *cobra.Command, args []string) error {
ctx, client, conn, cancel, err := newHeadscaleCLIWithConfig()
if err != nil {
return fmt.Errorf("connecting to headscale: %w", err)
}
defer cancel()
defer conn.Close()
return fn(ctx, client, cmd, args)
}
}
func newHeadscaleCLIWithConfig() (context.Context, v1.HeadscaleServiceClient, *grpc.ClientConn, context.CancelFunc, error) {
cfg, err := types.LoadCLIConfig()
if err != nil {
log.Fatal().
Err(err).
Caller().
Msgf("Failed to load configuration")
os.Exit(-1) // we get here if logging is suppressed (i.e., json output)
return nil, nil, nil, nil, fmt.Errorf("loading configuration: %w", err)
}
log.Debug().
@@ -56,7 +98,7 @@ func newHeadscaleCLIWithConfig() (context.Context, v1.HeadscaleServiceClient, *g
ctx, cancel := context.WithTimeout(context.Background(), cfg.CLI.Timeout)
grpcOptions := []grpc.DialOption{
grpc.WithBlock(),
grpc.WithBlock(), //nolint:staticcheck // SA1019: deprecated but supported in 1.x
}
address := cfg.CLI.Address
@@ -70,17 +112,23 @@ func newHeadscaleCLIWithConfig() (context.Context, v1.HeadscaleServiceClient, *g
address = cfg.UnixSocket
// Try to give the user better feedback if we cannot write to the headscale
// socket.
socket, err := os.OpenFile(cfg.UnixSocket, os.O_WRONLY, 0o666) // nolint
// socket. Note: os.OpenFile on a Unix domain socket returns ENXIO on
// Linux which is expected — only permission errors are actionable here.
// The actual gRPC connection uses net.Dial which handles sockets properly.
socket, err := os.OpenFile(cfg.UnixSocket, os.O_WRONLY, SocketWritePermissions) //nolint
if err != nil {
if os.IsPermission(err) {
log.Fatal().
Err(err).
Str("socket", cfg.UnixSocket).
Msgf("Unable to read/write to headscale socket, do you have the correct permissions?")
cancel()
return nil, nil, nil, nil, fmt.Errorf(
"unable to read/write to headscale socket %q, do you have the correct permissions? %w",
cfg.UnixSocket,
err,
)
}
} else {
socket.Close()
}
socket.Close()
grpcOptions = append(
grpcOptions,
@@ -91,8 +139,11 @@ func newHeadscaleCLIWithConfig() (context.Context, v1.HeadscaleServiceClient, *g
// If we are not connecting to a local server, require an API key for authentication
apiKey := cfg.CLI.APIKey
if apiKey == "" {
log.Fatal().Caller().Msgf("HEADSCALE_CLI_API_KEY environment variable needs to be set.")
cancel()
return nil, nil, nil, nil, errAPIKeyNotSet
}
grpcOptions = append(grpcOptions,
grpc.WithPerRPCCredentials(tokenAuth{
token: apiKey,
@@ -117,64 +168,136 @@ func newHeadscaleCLIWithConfig() (context.Context, v1.HeadscaleServiceClient, *g
}
}
log.Trace().Caller().Str("address", address).Msg("Connecting via gRPC")
conn, err := grpc.DialContext(ctx, address, grpcOptions...)
log.Trace().Caller().Str(zf.Address, address).Msg("connecting via gRPC")
conn, err := grpc.DialContext(ctx, address, grpcOptions...) //nolint:staticcheck // SA1019: deprecated but supported in 1.x
if err != nil {
log.Fatal().Caller().Err(err).Msgf("Could not connect: %v", err)
os.Exit(-1) // we get here if logging is suppressed (i.e., json output)
cancel()
return nil, nil, nil, nil, fmt.Errorf("connecting to %s: %w", address, err)
}
client := v1.NewHeadscaleServiceClient(conn)
return ctx, client, conn, cancel
return ctx, client, conn, cancel, nil
}
func output(result interface{}, override string, outputFormat string) string {
var jsonBytes []byte
var err error
// formatOutput serialises result into the requested format. For the
// default (empty) format the human-readable override string is returned.
func formatOutput(result any, override string, outputFormat string) (string, error) {
switch outputFormat {
case "json":
jsonBytes, err = json.MarshalIndent(result, "", "\t")
case outputFormatJSON:
b, err := json.MarshalIndent(result, "", "\t")
if err != nil {
log.Fatal().Err(err).Msg("failed to unmarshal output")
return "", fmt.Errorf("marshalling JSON output: %w", err)
}
case "json-line":
jsonBytes, err = json.Marshal(result)
return string(b), nil
case outputFormatJSONLine:
b, err := json.Marshal(result)
if err != nil {
log.Fatal().Err(err).Msg("failed to unmarshal output")
return "", fmt.Errorf("marshalling JSON-line output: %w", err)
}
case "yaml":
jsonBytes, err = yaml.Marshal(result)
return string(b), nil
case outputFormatYAML:
b, err := yaml.Marshal(result)
if err != nil {
log.Fatal().Err(err).Msg("failed to unmarshal output")
return "", fmt.Errorf("marshalling YAML output: %w", err)
}
return string(b), nil
default:
// nolint
return override
return override, nil
}
}
// printOutput formats result and writes it to stdout. It reads the --output
// flag from cmd to decide the serialisation format.
func printOutput(cmd *cobra.Command, result any, override string) error {
format, _ := cmd.Flags().GetString("output")
out, err := formatOutput(result, override, format)
if err != nil {
return err
}
return string(jsonBytes)
fmt.Println(out)
return nil
}
// SuccessOutput prints the result to stdout and exits with status code 0.
func SuccessOutput(result interface{}, override string, outputFormat string) {
fmt.Println(output(result, override, outputFormat))
os.Exit(0)
// expirationFromFlag parses the --expiration flag as a Prometheus-style
// duration (e.g. "90d", "1h") and returns an absolute timestamp.
func expirationFromFlag(cmd *cobra.Command) (*timestamppb.Timestamp, error) {
durationStr, _ := cmd.Flags().GetString("expiration")
duration, err := model.ParseDuration(durationStr)
if err != nil {
return nil, fmt.Errorf("parsing duration: %w", err)
}
return timestamppb.New(time.Now().UTC().Add(time.Duration(duration))), nil
}
// ErrorOutput prints an error message to stderr and exits with status code 1.
func ErrorOutput(errResult error, override string, outputFormat string) {
// confirmAction returns true when the user confirms a prompt, or when
// --force is set. Callers decide what to do when it returns false.
func confirmAction(cmd *cobra.Command, prompt string) bool {
force, _ := cmd.Flags().GetBool("force")
if force {
return true
}
return util.YesNo(prompt)
}
// printListOutput checks the --output flag: when a machine-readable format is
// requested it serialises data as JSON/YAML; otherwise it calls renderTable
// to produce the human-readable pterm table.
func printListOutput(
cmd *cobra.Command,
data any,
renderTable func() error,
) error {
format, _ := cmd.Flags().GetString("output")
if format != "" {
return printOutput(cmd, data, "")
}
return renderTable()
}
// printError writes err to stderr, formatting it as JSON/YAML when the
// --output flag requests machine-readable output. Used exclusively by
// Execute() so that every error surfaces in the format the caller asked for.
func printError(err error, outputFormat string) {
type errOutput struct {
Error string `json:"error"`
}
fmt.Fprintf(os.Stderr, "%s\n", output(errOutput{errResult.Error()}, override, outputFormat))
os.Exit(1)
e := errOutput{Error: err.Error()}
var formatted []byte
switch outputFormat {
case outputFormatJSON:
formatted, _ = json.MarshalIndent(e, "", "\t") //nolint:errchkjson // errOutput contains only a string field
case outputFormatJSONLine:
formatted, _ = json.Marshal(e) //nolint:errchkjson // errOutput contains only a string field
case outputFormatYAML:
formatted, _ = yaml.Marshal(e)
default:
fmt.Fprintf(os.Stderr, "Error: %s\n", err)
return
}
fmt.Fprintf(os.Stderr, "%s\n", formatted)
}
func HasMachineOutputFlag() bool {
func hasMachineOutputFlag() bool {
for _, arg := range os.Args {
if arg == "json" || arg == "json-line" || arg == "yaml" {
if arg == outputFormatJSON || arg == outputFormatJSONLine || arg == outputFormatYAML {
return true
}
}
@@ -199,152 +322,3 @@ func (t tokenAuth) GetRequestMetadata(
func (tokenAuth) RequireTransportSecurity() bool {
return true
}
// GetOutputFlag returns the output flag value (never fails)
func GetOutputFlag(cmd *cobra.Command) string {
output, _ := cmd.Flags().GetString("output")
return output
}
// GetNodeIdentifier returns the node ID using smart lookup via gRPC ListNodes call
func GetNodeIdentifier(cmd *cobra.Command) (uint64, error) {
nodeFlag, _ := cmd.Flags().GetString("node")
// Use --node flag
if nodeFlag == "" {
return 0, fmt.Errorf("--node flag is required")
}
// Use smart lookup via gRPC
return lookupNodeBySpecifier(nodeFlag)
}
// lookupNodeBySpecifier performs smart lookup of a node by ID, name, hostname, or IP
func lookupNodeBySpecifier(specifier string) (uint64, error) {
var nodeID uint64
err := WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
request := &v1.ListNodesRequest{}
// Detect what type of specifier this is and set appropriate filter
if id, err := strconv.ParseUint(specifier, 10, 64); err == nil && id > 0 {
// Looks like a numeric ID
request.Id = id
} else if isIPAddress(specifier) {
// Looks like an IP address
request.IpAddresses = []string{specifier}
} else {
// Treat as hostname/name
request.Name = specifier
}
response, err := client.ListNodes(ctx, request)
if err != nil {
return fmt.Errorf("failed to lookup node: %w", err)
}
nodes := response.GetNodes()
if len(nodes) == 0 {
return fmt.Errorf("node not found")
}
if len(nodes) > 1 {
var nodeInfo []string
for _, node := range nodes {
nodeInfo = append(nodeInfo, fmt.Sprintf("ID=%d name=%s", node.GetId(), node.GetName()))
}
return fmt.Errorf("multiple nodes found matching '%s': %s", specifier, strings.Join(nodeInfo, ", "))
}
// Exactly one match - this is what we want
nodeID = nodes[0].GetId()
return nil
})
if err != nil {
return 0, err
}
return nodeID, nil
}
// isIPAddress checks if a string looks like an IP address
func isIPAddress(s string) bool {
// Try parsing as IP address (both IPv4 and IPv6)
if net.ParseIP(s) != nil {
return true
}
// Try parsing as CIDR
if _, _, err := net.ParseCIDR(s); err == nil {
return true
}
return false
}
// GetUserIdentifier returns the user ID using smart lookup via gRPC ListUsers call
func GetUserIdentifier(cmd *cobra.Command) (uint64, error) {
userFlag, _ := cmd.Flags().GetString("user")
nameFlag, _ := cmd.Flags().GetString("name")
var specifier string
// Determine which flag was used (prefer --user, fall back to legacy flags)
if userFlag != "" {
specifier = userFlag
} else if nameFlag != "" {
specifier = nameFlag
} else {
return 0, fmt.Errorf("--user flag is required")
}
// Use smart lookup via gRPC
return lookupUserBySpecifier(specifier)
}
// lookupUserBySpecifier performs smart lookup of a user by ID, name, or email
func lookupUserBySpecifier(specifier string) (uint64, error) {
var userID uint64
err := WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
request := &v1.ListUsersRequest{}
// Detect what type of specifier this is and set appropriate filter
if id, err := strconv.ParseUint(specifier, 10, 64); err == nil && id > 0 {
// Looks like a numeric ID
request.Id = id
} else if strings.Contains(specifier, "@") {
// Looks like an email address
request.Email = specifier
} else {
// Treat as username
request.Name = specifier
}
response, err := client.ListUsers(ctx, request)
if err != nil {
return fmt.Errorf("failed to lookup user: %w", err)
}
users := response.GetUsers()
if len(users) == 0 {
return fmt.Errorf("user not found")
}
if len(users) > 1 {
var userInfo []string
for _, user := range users {
userInfo = append(userInfo, fmt.Sprintf("ID=%d name=%s email=%s", user.GetId(), user.GetName(), user.GetEmail()))
}
return fmt.Errorf("multiple users found matching '%s': %s", specifier, strings.Join(userInfo, ", "))
}
// Exactly one match - this is what we want
userID = users[0].GetId()
return nil
})
if err != nil {
return 0, err
}
return userID, nil
}

View File

@@ -1,175 +0,0 @@
package cli
import (
"os"
"testing"
"github.com/stretchr/testify/assert"
)
func TestHasMachineOutputFlag(t *testing.T) {
tests := []struct {
name string
args []string
expected bool
}{
{
name: "no machine output flags",
args: []string{"headscale", "users", "list"},
expected: false,
},
{
name: "json flag present",
args: []string{"headscale", "users", "list", "json"},
expected: true,
},
{
name: "json-line flag present",
args: []string{"headscale", "nodes", "list", "json-line"},
expected: true,
},
{
name: "yaml flag present",
args: []string{"headscale", "apikeys", "list", "yaml"},
expected: true,
},
{
name: "mixed flags with json",
args: []string{"headscale", "--config", "/tmp/config.yaml", "users", "list", "json"},
expected: true,
},
{
name: "flag as part of longer argument",
args: []string{"headscale", "users", "create", "json-user@example.com"},
expected: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Save original os.Args
originalArgs := os.Args
defer func() { os.Args = originalArgs }()
// Set os.Args to test case
os.Args = tt.args
result := HasMachineOutputFlag()
assert.Equal(t, tt.expected, result)
})
}
}
func TestOutput(t *testing.T) {
tests := []struct {
name string
result interface{}
override string
outputFormat string
expected string
}{
{
name: "default format returns override",
result: map[string]string{"test": "value"},
override: "Human readable output",
outputFormat: "",
expected: "Human readable output",
},
{
name: "default format with empty override",
result: map[string]string{"test": "value"},
override: "",
outputFormat: "",
expected: "",
},
{
name: "json format",
result: map[string]string{"name": "test", "id": "123"},
override: "Human readable",
outputFormat: "json",
expected: "{\n\t\"id\": \"123\",\n\t\"name\": \"test\"\n}",
},
{
name: "json-line format",
result: map[string]string{"name": "test", "id": "123"},
override: "Human readable",
outputFormat: "json-line",
expected: "{\"id\":\"123\",\"name\":\"test\"}",
},
{
name: "yaml format",
result: map[string]string{"name": "test", "id": "123"},
override: "Human readable",
outputFormat: "yaml",
expected: "id: \"123\"\nname: test\n",
},
{
name: "invalid format returns override",
result: map[string]string{"test": "value"},
override: "Human readable output",
outputFormat: "invalid",
expected: "Human readable output",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result := output(tt.result, tt.override, tt.outputFormat)
assert.Equal(t, tt.expected, result)
})
}
}
func TestOutputWithComplexData(t *testing.T) {
// Test with more complex data structures
complexData := struct {
Users []struct {
Name string `json:"name" yaml:"name"`
ID int `json:"id" yaml:"id"`
} `json:"users" yaml:"users"`
}{
Users: []struct {
Name string `json:"name" yaml:"name"`
ID int `json:"id" yaml:"id"`
}{
{Name: "user1", ID: 1},
{Name: "user2", ID: 2},
},
}
// Test JSON output
jsonResult := output(complexData, "override", "json")
assert.Contains(t, jsonResult, "\"users\":")
assert.Contains(t, jsonResult, "\"name\": \"user1\"")
assert.Contains(t, jsonResult, "\"id\": 1")
// Test YAML output
yamlResult := output(complexData, "override", "yaml")
assert.Contains(t, yamlResult, "users:")
assert.Contains(t, yamlResult, "name: user1")
assert.Contains(t, yamlResult, "id: 1")
}
func TestOutputWithNilData(t *testing.T) {
// Test with nil data
result := output(nil, "fallback", "json")
assert.Equal(t, "null", result)
result = output(nil, "fallback", "yaml")
assert.Equal(t, "null\n", result)
result = output(nil, "fallback", "")
assert.Equal(t, "fallback", result)
}
func TestOutputWithEmptyData(t *testing.T) {
// Test with empty slice
emptySlice := []string{}
result := output(emptySlice, "fallback", "json")
assert.Equal(t, "[]", result)
// Test with empty map
emptyMap := map[string]string{}
result = output(emptyMap, "fallback", "json")
assert.Equal(t, "{}", result)
}

View File

@@ -7,17 +7,16 @@ import (
func init() {
rootCmd.AddCommand(versionCmd)
versionCmd.Flags().StringP("output", "o", "", "Output format. Empty for human-readable, 'json', 'json-line' or 'yaml'")
}
var versionCmd = &cobra.Command{
Use: "version",
Short: "Print the version",
Long: "The version of headscale",
Run: func(cmd *cobra.Command, args []string) {
output := GetOutputFlag(cmd)
SuccessOutput(map[string]string{
"version": types.Version,
"commit": types.GitCommitHash,
}, types.Version, output)
Short: "Print the version.",
Long: "The version of headscale.",
RunE: func(cmd *cobra.Command, args []string) error {
info := types.GetVersionInfo()
return printOutput(cmd, info, info.String())
},
}

View File

@@ -1,45 +0,0 @@
package cli
import (
"testing"
"github.com/stretchr/testify/assert"
)
func TestVersionCommand(t *testing.T) {
// Test that version command exists
assert.NotNil(t, versionCmd)
assert.Equal(t, "version", versionCmd.Use)
assert.Equal(t, "Print the version.", versionCmd.Short)
assert.Equal(t, "The version of headscale.", versionCmd.Long)
}
func TestVersionCommandStructure(t *testing.T) {
// Test command is properly added to root
found := false
for _, cmd := range rootCmd.Commands() {
if cmd.Use == "version" {
found = true
break
}
}
assert.True(t, found, "version command should be added to root command")
}
func TestVersionCommandFlags(t *testing.T) {
// Version command should inherit output flag from root as persistent flag
outputFlag := versionCmd.Flag("output")
if outputFlag == nil {
// Try persistent flags from root
outputFlag = rootCmd.PersistentFlags().Lookup("output")
}
assert.NotNil(t, outputFlag, "version command should have access to output flag")
}
func TestVersionCommandRun(t *testing.T) {
// Test that Run function is set
assert.NotNil(t, versionCmd.Run)
// We can't easily test the actual execution without mocking SuccessOutput
// but we can verify the function exists and has the right signature
}

View File

@@ -12,6 +12,7 @@ import (
func main() {
var colors bool
switch l := termcolor.SupportLevel(os.Stderr); l {
case termcolor.Level16M:
colors = true

View File

@@ -9,34 +9,15 @@ import (
"github.com/juanfont/headscale/hscontrol/types"
"github.com/juanfont/headscale/hscontrol/util"
"github.com/spf13/viper"
"gopkg.in/check.v1"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func Test(t *testing.T) {
check.TestingT(t)
}
var _ = check.Suite(&Suite{})
type Suite struct{}
func (s *Suite) SetUpSuite(c *check.C) {
}
func (s *Suite) TearDownSuite(c *check.C) {
}
func (*Suite) TestConfigFileLoading(c *check.C) {
tmpDir, err := os.MkdirTemp("", "headscale")
if err != nil {
c.Fatal(err)
}
defer os.RemoveAll(tmpDir)
func TestConfigFileLoading(t *testing.T) {
tmpDir := t.TempDir()
path, err := os.Getwd()
if err != nil {
c.Fatal(err)
}
require.NoError(t, err)
cfgFile := filepath.Join(tmpDir, "config.yaml")
@@ -45,70 +26,52 @@ func (*Suite) TestConfigFileLoading(c *check.C) {
filepath.Clean(path+"/../../config-example.yaml"),
cfgFile,
)
if err != nil {
c.Fatal(err)
}
require.NoError(t, err)
// Load example config, it should load without validation errors
err = types.LoadConfig(cfgFile, true)
c.Assert(err, check.IsNil)
require.NoError(t, err)
// Test that config file was interpreted correctly
c.Assert(viper.GetString("server_url"), check.Equals, "http://127.0.0.1:8080")
c.Assert(viper.GetString("listen_addr"), check.Equals, "127.0.0.1:8080")
c.Assert(viper.GetString("metrics_listen_addr"), check.Equals, "127.0.0.1:9090")
c.Assert(viper.GetString("database.type"), check.Equals, "sqlite")
c.Assert(viper.GetString("database.sqlite.path"), check.Equals, "/var/lib/headscale/db.sqlite")
c.Assert(viper.GetString("tls_letsencrypt_hostname"), check.Equals, "")
c.Assert(viper.GetString("tls_letsencrypt_listen"), check.Equals, ":http")
c.Assert(viper.GetString("tls_letsencrypt_challenge_type"), check.Equals, "HTTP-01")
c.Assert(
util.GetFileMode("unix_socket_permission"),
check.Equals,
fs.FileMode(0o770),
)
c.Assert(viper.GetBool("logtail.enabled"), check.Equals, false)
assert.Equal(t, "http://127.0.0.1:8080", viper.GetString("server_url"))
assert.Equal(t, "127.0.0.1:8080", viper.GetString("listen_addr"))
assert.Equal(t, "127.0.0.1:9090", viper.GetString("metrics_listen_addr"))
assert.Equal(t, "sqlite", viper.GetString("database.type"))
assert.Equal(t, "/var/lib/headscale/db.sqlite", viper.GetString("database.sqlite.path"))
assert.Empty(t, viper.GetString("tls_letsencrypt_hostname"))
assert.Equal(t, ":http", viper.GetString("tls_letsencrypt_listen"))
assert.Equal(t, "HTTP-01", viper.GetString("tls_letsencrypt_challenge_type"))
assert.Equal(t, fs.FileMode(0o770), util.GetFileMode("unix_socket_permission"))
assert.False(t, viper.GetBool("logtail.enabled"))
}
func (*Suite) TestConfigLoading(c *check.C) {
tmpDir, err := os.MkdirTemp("", "headscale")
if err != nil {
c.Fatal(err)
}
defer os.RemoveAll(tmpDir)
func TestConfigLoading(t *testing.T) {
tmpDir := t.TempDir()
path, err := os.Getwd()
if err != nil {
c.Fatal(err)
}
require.NoError(t, err)
// Symlink the example config file
err = os.Symlink(
filepath.Clean(path+"/../../config-example.yaml"),
filepath.Join(tmpDir, "config.yaml"),
)
if err != nil {
c.Fatal(err)
}
require.NoError(t, err)
// Load example config, it should load without validation errors
err = types.LoadConfig(tmpDir, false)
c.Assert(err, check.IsNil)
require.NoError(t, err)
// Test that config file was interpreted correctly
c.Assert(viper.GetString("server_url"), check.Equals, "http://127.0.0.1:8080")
c.Assert(viper.GetString("listen_addr"), check.Equals, "127.0.0.1:8080")
c.Assert(viper.GetString("metrics_listen_addr"), check.Equals, "127.0.0.1:9090")
c.Assert(viper.GetString("database.type"), check.Equals, "sqlite")
c.Assert(viper.GetString("database.sqlite.path"), check.Equals, "/var/lib/headscale/db.sqlite")
c.Assert(viper.GetString("tls_letsencrypt_hostname"), check.Equals, "")
c.Assert(viper.GetString("tls_letsencrypt_listen"), check.Equals, ":http")
c.Assert(viper.GetString("tls_letsencrypt_challenge_type"), check.Equals, "HTTP-01")
c.Assert(
util.GetFileMode("unix_socket_permission"),
check.Equals,
fs.FileMode(0o770),
)
c.Assert(viper.GetBool("logtail.enabled"), check.Equals, false)
c.Assert(viper.GetBool("randomize_client_port"), check.Equals, false)
assert.Equal(t, "http://127.0.0.1:8080", viper.GetString("server_url"))
assert.Equal(t, "127.0.0.1:8080", viper.GetString("listen_addr"))
assert.Equal(t, "127.0.0.1:9090", viper.GetString("metrics_listen_addr"))
assert.Equal(t, "sqlite", viper.GetString("database.type"))
assert.Equal(t, "/var/lib/headscale/db.sqlite", viper.GetString("database.sqlite.path"))
assert.Empty(t, viper.GetString("tls_letsencrypt_hostname"))
assert.Equal(t, ":http", viper.GetString("tls_letsencrypt_listen"))
assert.Equal(t, "HTTP-01", viper.GetString("tls_letsencrypt_challenge_type"))
assert.Equal(t, fs.FileMode(0o770), util.GetFileMode("unix_socket_permission"))
assert.False(t, viper.GetBool("logtail.enabled"))
assert.False(t, viper.GetBool("randomize_client_port"))
}

6
cmd/hi/README.md Normal file
View File

@@ -0,0 +1,6 @@
# hi
hi (headscale integration runner) is an entirely "vibe coded" wrapper around our
[integration test suite](../integration). It essentially runs the docker
commands for you with some added benefits of extracting resources like logs and
databases.

View File

@@ -3,9 +3,13 @@ package main
import (
"context"
"fmt"
"log"
"os"
"path/filepath"
"strings"
"time"
"github.com/cenkalti/backoff/v5"
"github.com/docker/docker/api/types/container"
"github.com/docker/docker/api/types/filters"
"github.com/docker/docker/api/types/image"
@@ -14,30 +18,46 @@ import (
)
// cleanupBeforeTest performs cleanup operations before running tests.
// Only removes stale (stopped/exited) test containers to avoid interfering with concurrent test runs.
func cleanupBeforeTest(ctx context.Context) error {
if err := killTestContainers(ctx); err != nil {
return fmt.Errorf("failed to kill test containers: %w", err)
err := cleanupStaleTestContainers(ctx)
if err != nil {
return fmt.Errorf("cleaning stale test containers: %w", err)
}
if err := pruneDockerNetworks(ctx); err != nil {
return fmt.Errorf("failed to prune networks: %w", err)
if err := pruneDockerNetworks(ctx); err != nil { //nolint:noinlineerr
return fmt.Errorf("pruning networks: %w", err)
}
return nil
}
// cleanupAfterTest removes the test container after completion.
func cleanupAfterTest(ctx context.Context, cli *client.Client, containerID string) error {
return cli.ContainerRemove(ctx, containerID, container.RemoveOptions{
// cleanupAfterTest removes the test container and all associated integration test containers for the run.
func cleanupAfterTest(ctx context.Context, cli *client.Client, containerID, runID string) error {
// Remove the main test container
err := cli.ContainerRemove(ctx, containerID, container.RemoveOptions{
Force: true,
})
if err != nil {
return fmt.Errorf("removing test container: %w", err)
}
// Clean up integration test containers for this run only
if runID != "" {
err := killTestContainersByRunID(ctx, runID)
if err != nil {
return fmt.Errorf("cleaning up containers for run %s: %w", runID, err)
}
}
return nil
}
// killTestContainers terminates and removes all test containers.
func killTestContainers(ctx context.Context) error {
cli, err := createDockerClient()
cli, err := createDockerClient(ctx)
if err != nil {
return fmt.Errorf("failed to create Docker client: %w", err)
return fmt.Errorf("creating Docker client: %w", err)
}
defer cli.Close()
@@ -45,12 +65,14 @@ func killTestContainers(ctx context.Context) error {
All: true,
})
if err != nil {
return fmt.Errorf("failed to list containers: %w", err)
return fmt.Errorf("listing containers: %w", err)
}
removed := 0
for _, cont := range containers {
shouldRemove := false
for _, name := range cont.Names {
if strings.Contains(name, "headscale-test-suite") ||
strings.Contains(name, "hs-") ||
@@ -83,43 +105,135 @@ func killTestContainers(ctx context.Context) error {
return nil
}
// killTestContainersByRunID terminates and removes all test containers for a specific run ID.
// This function filters containers by the hi.run-id label to only affect containers
// belonging to the specified test run, leaving other concurrent test runs untouched.
func killTestContainersByRunID(ctx context.Context, runID string) error {
cli, err := createDockerClient(ctx)
if err != nil {
return fmt.Errorf("creating Docker client: %w", err)
}
defer cli.Close()
// Filter containers by hi.run-id label
containers, err := cli.ContainerList(ctx, container.ListOptions{
All: true,
Filters: filters.NewArgs(
filters.Arg("label", "hi.run-id="+runID),
),
})
if err != nil {
return fmt.Errorf("listing containers for run %s: %w", runID, err)
}
removed := 0
for _, cont := range containers {
// Kill the container if it's running
if cont.State == "running" {
_ = cli.ContainerKill(ctx, cont.ID, "KILL")
}
// Remove the container with retry logic
if removeContainerWithRetry(ctx, cli, cont.ID) {
removed++
}
}
if removed > 0 {
fmt.Printf("Removed %d containers for run ID %s\n", removed, runID)
}
return nil
}
// cleanupStaleTestContainers removes stopped/exited test containers without affecting running tests.
// This is useful for cleaning up leftover containers from previous crashed or interrupted test runs
// without interfering with currently running concurrent tests.
func cleanupStaleTestContainers(ctx context.Context) error {
cli, err := createDockerClient(ctx)
if err != nil {
return fmt.Errorf("creating Docker client: %w", err)
}
defer cli.Close()
// Only get stopped/exited containers
containers, err := cli.ContainerList(ctx, container.ListOptions{
All: true,
Filters: filters.NewArgs(
filters.Arg("status", "exited"),
filters.Arg("status", "dead"),
),
})
if err != nil {
return fmt.Errorf("listing stopped containers: %w", err)
}
removed := 0
for _, cont := range containers {
// Only remove containers that look like test containers
shouldRemove := false
for _, name := range cont.Names {
if strings.Contains(name, "headscale-test-suite") ||
strings.Contains(name, "hs-") ||
strings.Contains(name, "ts-") ||
strings.Contains(name, "derp-") {
shouldRemove = true
break
}
}
if shouldRemove {
if removeContainerWithRetry(ctx, cli, cont.ID) {
removed++
}
}
}
if removed > 0 {
fmt.Printf("Removed %d stale test containers\n", removed)
}
return nil
}
const (
containerRemoveInitialInterval = 100 * time.Millisecond
containerRemoveMaxElapsedTime = 2 * time.Second
)
// removeContainerWithRetry attempts to remove a container with exponential backoff retry logic.
func removeContainerWithRetry(ctx context.Context, cli *client.Client, containerID string) bool {
maxRetries := 3
baseDelay := 100 * time.Millisecond
expBackoff := backoff.NewExponentialBackOff()
expBackoff.InitialInterval = containerRemoveInitialInterval
for attempt := range maxRetries {
_, err := backoff.Retry(ctx, func() (struct{}, error) {
err := cli.ContainerRemove(ctx, containerID, container.RemoveOptions{
Force: true,
})
if err == nil {
return true
if err != nil {
return struct{}{}, err
}
// If this is the last attempt, don't wait
if attempt == maxRetries-1 {
break
}
return struct{}{}, nil
}, backoff.WithBackOff(expBackoff), backoff.WithMaxElapsedTime(containerRemoveMaxElapsedTime))
// Wait with exponential backoff
delay := baseDelay * time.Duration(1<<attempt)
time.Sleep(delay)
}
return false
return err == nil
}
// pruneDockerNetworks removes unused Docker networks.
func pruneDockerNetworks(ctx context.Context) error {
cli, err := createDockerClient()
cli, err := createDockerClient(ctx)
if err != nil {
return fmt.Errorf("failed to create Docker client: %w", err)
return fmt.Errorf("creating Docker client: %w", err)
}
defer cli.Close()
report, err := cli.NetworksPrune(ctx, filters.Args{})
if err != nil {
return fmt.Errorf("failed to prune networks: %w", err)
return fmt.Errorf("pruning networks: %w", err)
}
if len(report.NetworksDeleted) > 0 {
@@ -133,9 +247,9 @@ func pruneDockerNetworks(ctx context.Context) error {
// cleanOldImages removes test-related and old dangling Docker images.
func cleanOldImages(ctx context.Context) error {
cli, err := createDockerClient()
cli, err := createDockerClient(ctx)
if err != nil {
return fmt.Errorf("failed to create Docker client: %w", err)
return fmt.Errorf("creating Docker client: %w", err)
}
defer cli.Close()
@@ -143,12 +257,14 @@ func cleanOldImages(ctx context.Context) error {
All: true,
})
if err != nil {
return fmt.Errorf("failed to list images: %w", err)
return fmt.Errorf("listing images: %w", err)
}
removed := 0
for _, img := range images {
shouldRemove := false
for _, tag := range img.RepoTags {
if strings.Contains(tag, "hs-") ||
strings.Contains(tag, "headscale-integration") ||
@@ -183,18 +299,19 @@ func cleanOldImages(ctx context.Context) error {
// cleanCacheVolume removes the Docker volume used for Go module cache.
func cleanCacheVolume(ctx context.Context) error {
cli, err := createDockerClient()
cli, err := createDockerClient(ctx)
if err != nil {
return fmt.Errorf("failed to create Docker client: %w", err)
return fmt.Errorf("creating Docker client: %w", err)
}
defer cli.Close()
volumeName := "hs-integration-go-cache"
err = cli.VolumeRemove(ctx, volumeName, true)
if err != nil {
if errdefs.IsNotFound(err) {
if errdefs.IsNotFound(err) { //nolint:staticcheck // SA1019: deprecated but functional
fmt.Printf("Go module cache volume not found: %s\n", volumeName)
} else if errdefs.IsConflict(err) {
} else if errdefs.IsConflict(err) { //nolint:staticcheck // SA1019: deprecated but functional
fmt.Printf("Go module cache volume is in use and cannot be removed: %s\n", volumeName)
} else {
fmt.Printf("Failed to remove Go module cache volume %s: %v\n", volumeName, err)
@@ -205,3 +322,110 @@ func cleanCacheVolume(ctx context.Context) error {
return nil
}
// cleanupSuccessfulTestArtifacts removes artifacts from successful test runs to save disk space.
// This function removes large artifacts that are mainly useful for debugging failures:
// - Database dumps (.db files)
// - Profile data (pprof directories)
// - MapResponse data (mapresponses directories)
// - Prometheus metrics files
//
// It preserves:
// - Log files (.log) which are small and useful for verification.
func cleanupSuccessfulTestArtifacts(logsDir string, verbose bool) error {
entries, err := os.ReadDir(logsDir)
if err != nil {
return fmt.Errorf("reading logs directory: %w", err)
}
var (
removedFiles, removedDirs int
totalSize int64
)
for _, entry := range entries {
name := entry.Name()
fullPath := filepath.Join(logsDir, name)
if entry.IsDir() {
// Remove pprof and mapresponses directories (typically large)
// These directories contain artifacts from all containers in the test run
if name == "pprof" || name == "mapresponses" {
size, sizeErr := getDirSize(fullPath)
if sizeErr == nil {
totalSize += size
}
err := os.RemoveAll(fullPath)
if err != nil {
if verbose {
log.Printf("Warning: failed to remove directory %s: %v", name, err)
}
} else {
removedDirs++
if verbose {
log.Printf("Removed directory: %s/", name)
}
}
}
} else {
// Only process test-related files (headscale and tailscale)
if !strings.HasPrefix(name, "hs-") && !strings.HasPrefix(name, "ts-") {
continue
}
// Remove database, metrics, and status files, but keep logs
shouldRemove := strings.HasSuffix(name, ".db") ||
strings.HasSuffix(name, "_metrics.txt") ||
strings.HasSuffix(name, "_status.json")
if shouldRemove {
info, infoErr := entry.Info()
if infoErr == nil {
totalSize += info.Size()
}
err := os.Remove(fullPath)
if err != nil {
if verbose {
log.Printf("Warning: failed to remove file %s: %v", name, err)
}
} else {
removedFiles++
if verbose {
log.Printf("Removed file: %s", name)
}
}
}
}
}
if removedFiles > 0 || removedDirs > 0 {
const bytesPerMB = 1024 * 1024
log.Printf("Cleaned up %d files and %d directories (freed ~%.2f MB)",
removedFiles, removedDirs, float64(totalSize)/bytesPerMB)
}
return nil
}
// getDirSize calculates the total size of a directory.
func getDirSize(path string) (int64, error) {
var size int64
err := filepath.Walk(path, func(_ string, info os.FileInfo, err error) error {
if err != nil {
return err
}
if !info.IsDir() {
size += info.Size()
}
return nil
})
return size, err
}

View File

@@ -22,17 +22,22 @@ import (
"github.com/juanfont/headscale/integration/dockertestutil"
)
const defaultDirPerm = 0o755
var (
ErrTestFailed = errors.New("test failed")
ErrUnexpectedContainerWait = errors.New("unexpected end of container wait")
ErrNoDockerContext = errors.New("no docker context found")
ErrMemoryLimitViolations = errors.New("container(s) exceeded memory limits")
)
// runTestContainer executes integration tests in a Docker container.
//
//nolint:gocyclo // complex test orchestration function
func runTestContainer(ctx context.Context, config *RunConfig) error {
cli, err := createDockerClient()
cli, err := createDockerClient(ctx)
if err != nil {
return fmt.Errorf("failed to create Docker client: %w", err)
return fmt.Errorf("creating Docker client: %w", err)
}
defer cli.Close()
@@ -48,19 +53,21 @@ func runTestContainer(ctx context.Context, config *RunConfig) error {
absLogsDir, err := filepath.Abs(logsDir)
if err != nil {
return fmt.Errorf("failed to get absolute path for logs directory: %w", err)
return fmt.Errorf("getting absolute path for logs directory: %w", err)
}
const dirPerm = 0o755
if err := os.MkdirAll(absLogsDir, dirPerm); err != nil {
return fmt.Errorf("failed to create logs directory: %w", err)
if err := os.MkdirAll(absLogsDir, dirPerm); err != nil { //nolint:noinlineerr
return fmt.Errorf("creating logs directory: %w", err)
}
if config.CleanBefore {
if config.Verbose {
log.Printf("Running pre-test cleanup...")
}
if err := cleanupBeforeTest(ctx); err != nil && config.Verbose {
err := cleanupBeforeTest(ctx)
if err != nil && config.Verbose {
log.Printf("Warning: pre-test cleanup failed: %v", err)
}
}
@@ -71,52 +78,118 @@ func runTestContainer(ctx context.Context, config *RunConfig) error {
}
imageName := "golang:" + config.GoVersion
if err := ensureImageAvailable(ctx, cli, imageName, config.Verbose); err != nil {
return fmt.Errorf("failed to ensure image availability: %w", err)
if err := ensureImageAvailable(ctx, cli, imageName, config.Verbose); err != nil { //nolint:noinlineerr
return fmt.Errorf("ensuring image availability: %w", err)
}
resp, err := createGoTestContainer(ctx, cli, config, containerName, absLogsDir, goTestCmd)
if err != nil {
return fmt.Errorf("failed to create container: %w", err)
return fmt.Errorf("creating container: %w", err)
}
if config.Verbose {
log.Printf("Created container: %s", resp.ID)
}
if err := cli.ContainerStart(ctx, resp.ID, container.StartOptions{}); err != nil {
return fmt.Errorf("failed to start container: %w", err)
if err := cli.ContainerStart(ctx, resp.ID, container.StartOptions{}); err != nil { //nolint:noinlineerr
return fmt.Errorf("starting container: %w", err)
}
log.Printf("Starting test: %s", config.TestPattern)
log.Printf("Run ID: %s", runID)
log.Printf("Monitor with: docker logs -f %s", containerName)
log.Printf("Logs directory: %s", logsDir)
// Start stats collection for container resource monitoring (if enabled)
var statsCollector *StatsCollector
if config.Stats {
var err error
statsCollector, err = NewStatsCollector(ctx)
if err != nil {
if config.Verbose {
log.Printf("Warning: failed to create stats collector: %v", err)
}
statsCollector = nil
}
if statsCollector != nil {
defer statsCollector.Close()
// Start stats collection immediately - no need for complex retry logic
// The new implementation monitors Docker events and will catch containers as they start
err := statsCollector.StartCollection(ctx, runID, config.Verbose)
if err != nil {
if config.Verbose {
log.Printf("Warning: failed to start stats collection: %v", err)
}
}
defer statsCollector.StopCollection()
}
}
exitCode, err := streamAndWait(ctx, cli, resp.ID)
// Ensure all containers have finished and logs are flushed before extracting artifacts
if waitErr := waitForContainerFinalization(ctx, cli, resp.ID, config.Verbose); waitErr != nil && config.Verbose {
waitErr := waitForContainerFinalization(ctx, cli, resp.ID, config.Verbose)
if waitErr != nil && config.Verbose {
log.Printf("Warning: failed to wait for container finalization: %v", waitErr)
}
// Extract artifacts from test containers before cleanup
if err := extractArtifactsFromContainers(ctx, resp.ID, logsDir, config.Verbose); err != nil && config.Verbose {
if err := extractArtifactsFromContainers(ctx, resp.ID, logsDir, config.Verbose); err != nil && config.Verbose { //nolint:noinlineerr
log.Printf("Warning: failed to extract artifacts from containers: %v", err)
}
// Always list control files regardless of test outcome
listControlFiles(logsDir)
// Print stats summary and check memory limits if enabled
if config.Stats && statsCollector != nil {
violations := statsCollector.PrintSummaryAndCheckLimits(config.HSMemoryLimit, config.TSMemoryLimit)
if len(violations) > 0 {
log.Printf("MEMORY LIMIT VIOLATIONS DETECTED:")
log.Printf("=================================")
for _, violation := range violations {
log.Printf("Container %s exceeded memory limit: %.1f MB > %.1f MB",
violation.ContainerName, violation.MaxMemoryMB, violation.LimitMB)
}
return fmt.Errorf("test failed: %d %w", len(violations), ErrMemoryLimitViolations)
}
}
shouldCleanup := config.CleanAfter && (!config.KeepOnFailure || exitCode == 0)
if shouldCleanup {
if config.Verbose {
log.Printf("Running post-test cleanup...")
log.Printf("Running post-test cleanup for run %s...", runID)
}
if cleanErr := cleanupAfterTest(ctx, cli, resp.ID); cleanErr != nil && config.Verbose {
cleanErr := cleanupAfterTest(ctx, cli, resp.ID, runID)
if cleanErr != nil && config.Verbose {
log.Printf("Warning: post-test cleanup failed: %v", cleanErr)
}
// Clean up artifacts from successful tests to save disk space in CI
if exitCode == 0 {
if config.Verbose {
log.Printf("Test succeeded, cleaning up artifacts to save disk space...")
}
cleanErr := cleanupSuccessfulTestArtifacts(logsDir, config.Verbose)
if cleanErr != nil && config.Verbose {
log.Printf("Warning: artifact cleanup failed: %v", cleanErr)
}
}
}
if err != nil {
return fmt.Errorf("test execution failed: %w", err)
return fmt.Errorf("executing test: %w", err)
}
if exitCode != 0 {
@@ -150,7 +223,7 @@ func buildGoTestCommand(config *RunConfig) []string {
func createGoTestContainer(ctx context.Context, cli *client.Client, config *RunConfig, containerName, logsDir string, goTestCmd []string) (container.CreateResponse, error) {
pwd, err := os.Getwd()
if err != nil {
return container.CreateResponse{}, fmt.Errorf("failed to get working directory: %w", err)
return container.CreateResponse{}, fmt.Errorf("getting working directory: %w", err)
}
projectRoot := findProjectRoot(pwd)
@@ -161,6 +234,28 @@ func createGoTestContainer(ctx context.Context, cli *client.Client, config *RunC
fmt.Sprintf("HEADSCALE_INTEGRATION_POSTGRES=%d", boolToInt(config.UsePostgres)),
"HEADSCALE_INTEGRATION_RUN_ID=" + runID,
}
// Pass through CI environment variable for CI detection
if ci := os.Getenv("CI"); ci != "" {
env = append(env, "CI="+ci)
}
// Pass through all HEADSCALE_INTEGRATION_* environment variables
for _, e := range os.Environ() {
if strings.HasPrefix(e, "HEADSCALE_INTEGRATION_") {
// Skip the ones we already set explicitly
if strings.HasPrefix(e, "HEADSCALE_INTEGRATION_POSTGRES=") ||
strings.HasPrefix(e, "HEADSCALE_INTEGRATION_RUN_ID=") {
continue
}
env = append(env, e)
}
}
// Set GOCACHE to a known location (used by both bind mount and volume cases)
env = append(env, "GOCACHE=/cache/go-build")
containerConfig := &container.Config{
Image: "golang:" + config.GoVersion,
Cmd: goTestCmd,
@@ -180,20 +275,43 @@ func createGoTestContainer(ctx context.Context, cli *client.Client, config *RunC
log.Printf("Using Docker socket: %s", dockerSocketPath)
}
binds := []string{
fmt.Sprintf("%s:%s", projectRoot, projectRoot),
dockerSocketPath + ":/var/run/docker.sock",
logsDir + ":/tmp/control",
}
// Use bind mounts for Go cache if provided via environment variables,
// otherwise fall back to Docker volumes for local development
var mounts []mount.Mount
goCache := os.Getenv("HEADSCALE_INTEGRATION_GO_CACHE")
goBuildCache := os.Getenv("HEADSCALE_INTEGRATION_GO_BUILD_CACHE")
if goCache != "" {
binds = append(binds, goCache+":/go")
} else {
mounts = append(mounts, mount.Mount{
Type: mount.TypeVolume,
Source: "hs-integration-go-cache",
Target: "/go",
})
}
if goBuildCache != "" {
binds = append(binds, goBuildCache+":/cache/go-build")
} else {
mounts = append(mounts, mount.Mount{
Type: mount.TypeVolume,
Source: "hs-integration-go-build-cache",
Target: "/cache/go-build",
})
}
hostConfig := &container.HostConfig{
AutoRemove: false, // We'll remove manually for better control
Binds: []string{
fmt.Sprintf("%s:%s", projectRoot, projectRoot),
dockerSocketPath + ":/var/run/docker.sock",
logsDir + ":/tmp/control",
},
Mounts: []mount.Mount{
{
Type: mount.TypeVolume,
Source: "hs-integration-go-cache",
Target: "/go",
},
},
Binds: binds,
Mounts: mounts,
}
return cli.ContainerCreate(ctx, containerConfig, hostConfig, nil, nil, containerName)
@@ -207,7 +325,7 @@ func streamAndWait(ctx context.Context, cli *client.Client, containerID string)
Follow: true,
})
if err != nil {
return -1, fmt.Errorf("failed to get container logs: %w", err)
return -1, fmt.Errorf("getting container logs: %w", err)
}
defer out.Close()
@@ -219,7 +337,7 @@ func streamAndWait(ctx context.Context, cli *client.Client, containerID string)
select {
case err := <-errCh:
if err != nil {
return -1, fmt.Errorf("error waiting for container: %w", err)
return -1, fmt.Errorf("waiting for container: %w", err)
}
case status := <-statusCh:
return int(status.StatusCode), nil
@@ -233,7 +351,7 @@ func waitForContainerFinalization(ctx context.Context, cli *client.Client, testC
// First, get all related test containers
containers, err := cli.ContainerList(ctx, container.ListOptions{All: true})
if err != nil {
return fmt.Errorf("failed to list containers: %w", err)
return fmt.Errorf("listing containers: %w", err)
}
testContainers := getCurrentTestContainers(containers, testContainerID, verbose)
@@ -242,6 +360,7 @@ func waitForContainerFinalization(ctx context.Context, cli *client.Client, testC
maxWaitTime := 10 * time.Second
checkInterval := 500 * time.Millisecond
timeout := time.After(maxWaitTime)
ticker := time.NewTicker(checkInterval)
defer ticker.Stop()
@@ -251,6 +370,7 @@ func waitForContainerFinalization(ctx context.Context, cli *client.Client, testC
if verbose {
log.Printf("Timeout waiting for container finalization, proceeding with artifact extraction")
}
return nil
case <-ticker.C:
allFinalized := true
@@ -261,12 +381,14 @@ func waitForContainerFinalization(ctx context.Context, cli *client.Client, testC
if verbose {
log.Printf("Warning: failed to inspect container %s: %v", testCont.name, err)
}
continue
}
// Check if container is in a final state
if !isContainerFinalized(inspect.State) {
allFinalized = false
if verbose {
log.Printf("Container %s still finalizing (state: %s)", testCont.name, inspect.State.Status)
}
@@ -279,6 +401,7 @@ func waitForContainerFinalization(ctx context.Context, cli *client.Client, testC
if verbose {
log.Printf("All test containers finalized, ready for artifact extraction")
}
return nil
}
}
@@ -295,13 +418,15 @@ func isContainerFinalized(state *container.State) bool {
func findProjectRoot(startPath string) string {
current := startPath
for {
if _, err := os.Stat(filepath.Join(current, "go.mod")); err == nil {
if _, err := os.Stat(filepath.Join(current, "go.mod")); err == nil { //nolint:noinlineerr
return current
}
parent := filepath.Dir(current)
if parent == current {
return startPath
}
current = parent
}
}
@@ -311,34 +436,37 @@ func boolToInt(b bool) int {
if b {
return 1
}
return 0
}
// DockerContext represents Docker context information.
type DockerContext struct {
Name string `json:"Name"`
Metadata map[string]interface{} `json:"Metadata"`
Endpoints map[string]interface{} `json:"Endpoints"`
Current bool `json:"Current"`
Name string `json:"Name"`
Metadata map[string]any `json:"Metadata"`
Endpoints map[string]any `json:"Endpoints"`
Current bool `json:"Current"`
}
// createDockerClient creates a Docker client with context detection.
func createDockerClient() (*client.Client, error) {
contextInfo, err := getCurrentDockerContext()
func createDockerClient(ctx context.Context) (*client.Client, error) {
contextInfo, err := getCurrentDockerContext(ctx)
if err != nil {
return client.NewClientWithOpts(client.FromEnv, client.WithAPIVersionNegotiation())
}
var clientOpts []client.Opt
clientOpts = append(clientOpts, client.WithAPIVersionNegotiation())
if contextInfo != nil {
if endpoints, ok := contextInfo.Endpoints["docker"]; ok {
if endpointMap, ok := endpoints.(map[string]interface{}); ok {
if endpointMap, ok := endpoints.(map[string]any); ok {
if host, ok := endpointMap["Host"].(string); ok {
if runConfig.Verbose {
log.Printf("Using Docker host from context '%s': %s", contextInfo.Name, host)
}
clientOpts = append(clientOpts, client.WithHost(host))
}
}
@@ -353,16 +481,17 @@ func createDockerClient() (*client.Client, error) {
}
// getCurrentDockerContext retrieves the current Docker context information.
func getCurrentDockerContext() (*DockerContext, error) {
cmd := exec.Command("docker", "context", "inspect")
func getCurrentDockerContext(ctx context.Context) (*DockerContext, error) {
cmd := exec.CommandContext(ctx, "docker", "context", "inspect")
output, err := cmd.Output()
if err != nil {
return nil, fmt.Errorf("failed to get docker context: %w", err)
return nil, fmt.Errorf("getting docker context: %w", err)
}
var contexts []DockerContext
if err := json.Unmarshal(output, &contexts); err != nil {
return nil, fmt.Errorf("failed to parse docker context: %w", err)
if err := json.Unmarshal(output, &contexts); err != nil { //nolint:noinlineerr
return nil, fmt.Errorf("parsing docker context: %w", err)
}
if len(contexts) > 0 {
@@ -379,28 +508,58 @@ func getDockerSocketPath() string {
return "/var/run/docker.sock"
}
// ensureImageAvailable pulls the specified Docker image to ensure it's available.
// checkImageAvailableLocally checks if the specified Docker image is available locally.
func checkImageAvailableLocally(ctx context.Context, cli *client.Client, imageName string) (bool, error) {
_, _, err := cli.ImageInspectWithRaw(ctx, imageName) //nolint:staticcheck // SA1019: deprecated but functional
if err != nil {
if client.IsErrNotFound(err) { //nolint:staticcheck // SA1019: deprecated but functional
return false, nil
}
return false, fmt.Errorf("inspecting image %s: %w", imageName, err)
}
return true, nil
}
// ensureImageAvailable checks if the image is available locally first, then pulls if needed.
func ensureImageAvailable(ctx context.Context, cli *client.Client, imageName string, verbose bool) error {
// First check if image is available locally
available, err := checkImageAvailableLocally(ctx, cli, imageName)
if err != nil {
return fmt.Errorf("checking local image availability: %w", err)
}
if available {
if verbose {
log.Printf("Image %s is available locally", imageName)
}
return nil
}
// Image not available locally, try to pull it
if verbose {
log.Printf("Pulling image %s...", imageName)
log.Printf("Image %s not found locally, pulling...", imageName)
}
reader, err := cli.ImagePull(ctx, imageName, image.PullOptions{})
if err != nil {
return fmt.Errorf("failed to pull image %s: %w", imageName, err)
return fmt.Errorf("pulling image %s: %w", imageName, err)
}
defer reader.Close()
if verbose {
_, err = io.Copy(os.Stdout, reader)
if err != nil {
return fmt.Errorf("failed to read pull output: %w", err)
return fmt.Errorf("reading pull output: %w", err)
}
} else {
_, err = io.Copy(io.Discard, reader)
if err != nil {
return fmt.Errorf("failed to read pull output: %w", err)
return fmt.Errorf("reading pull output: %w", err)
}
log.Printf("Image %s pulled successfully", imageName)
}
@@ -415,9 +574,11 @@ func listControlFiles(logsDir string) {
return
}
var logFiles []string
var dataFiles []string
var dataDirs []string
var (
logFiles []string
dataFiles []string
dataDirs []string
)
for _, entry := range entries {
name := entry.Name()
@@ -446,6 +607,7 @@ func listControlFiles(logsDir string) {
if len(logFiles) > 0 {
log.Printf("Headscale logs:")
for _, file := range logFiles {
log.Printf(" %s", file)
}
@@ -453,9 +615,11 @@ func listControlFiles(logsDir string) {
if len(dataFiles) > 0 || len(dataDirs) > 0 {
log.Printf("Headscale data:")
for _, file := range dataFiles {
log.Printf(" %s", file)
}
for _, dir := range dataDirs {
log.Printf(" %s/", dir)
}
@@ -464,25 +628,27 @@ func listControlFiles(logsDir string) {
// extractArtifactsFromContainers collects container logs and files from the specific test run.
func extractArtifactsFromContainers(ctx context.Context, testContainerID, logsDir string, verbose bool) error {
cli, err := createDockerClient()
cli, err := createDockerClient(ctx)
if err != nil {
return fmt.Errorf("failed to create Docker client: %w", err)
return fmt.Errorf("creating Docker client: %w", err)
}
defer cli.Close()
// List all containers
containers, err := cli.ContainerList(ctx, container.ListOptions{All: true})
if err != nil {
return fmt.Errorf("failed to list containers: %w", err)
return fmt.Errorf("listing containers: %w", err)
}
// Get containers from the specific test run
currentTestContainers := getCurrentTestContainers(containers, testContainerID, verbose)
extractedCount := 0
for _, cont := range currentTestContainers {
// Extract container logs and tar files
if err := extractContainerArtifacts(ctx, cli, cont.ID, cont.name, logsDir, verbose); err != nil {
err := extractContainerArtifacts(ctx, cli, cont.ID, cont.name, logsDir, verbose)
if err != nil {
if verbose {
log.Printf("Warning: failed to extract artifacts from container %s (%s): %v", cont.name, cont.ID[:12], err)
}
@@ -490,6 +656,7 @@ func extractArtifactsFromContainers(ctx context.Context, testContainerID, logsDi
if verbose {
log.Printf("Extracted artifacts from container %s (%s)", cont.name, cont.ID[:12])
}
extractedCount++
}
}
@@ -513,11 +680,13 @@ func getCurrentTestContainers(containers []container.Summary, testContainerID st
// Find the test container to get its run ID label
var runID string
for _, cont := range containers {
if cont.ID == testContainerID {
if cont.Labels != nil {
runID = cont.Labels["hi.run-id"]
}
break
}
}
@@ -558,18 +727,21 @@ func getCurrentTestContainers(containers []container.Summary, testContainerID st
// extractContainerArtifacts saves logs and tar files from a container.
func extractContainerArtifacts(ctx context.Context, cli *client.Client, containerID, containerName, logsDir string, verbose bool) error {
// Ensure the logs directory exists
if err := os.MkdirAll(logsDir, 0o755); err != nil {
return fmt.Errorf("failed to create logs directory: %w", err)
err := os.MkdirAll(logsDir, defaultDirPerm)
if err != nil {
return fmt.Errorf("creating logs directory: %w", err)
}
// Extract container logs
if err := extractContainerLogs(ctx, cli, containerID, containerName, logsDir, verbose); err != nil {
return fmt.Errorf("failed to extract logs: %w", err)
err = extractContainerLogs(ctx, cli, containerID, containerName, logsDir, verbose)
if err != nil {
return fmt.Errorf("extracting logs: %w", err)
}
// Extract tar files for headscale containers only
if strings.HasPrefix(containerName, "hs-") {
if err := extractContainerFiles(ctx, cli, containerID, containerName, logsDir, verbose); err != nil {
err := extractContainerFiles(ctx, cli, containerID, containerName, logsDir, verbose)
if err != nil {
if verbose {
log.Printf("Warning: failed to extract files from %s: %v", containerName, err)
}
@@ -591,7 +763,7 @@ func extractContainerLogs(ctx context.Context, cli *client.Client, containerID,
Tail: "all",
})
if err != nil {
return fmt.Errorf("failed to get container logs: %w", err)
return fmt.Errorf("getting container logs: %w", err)
}
defer logReader.Close()
@@ -605,17 +777,17 @@ func extractContainerLogs(ctx context.Context, cli *client.Client, containerID,
// Demultiplex the Docker logs stream to separate stdout and stderr
_, err = stdcopy.StdCopy(&stdoutBuf, &stderrBuf, logReader)
if err != nil {
return fmt.Errorf("failed to demultiplex container logs: %w", err)
return fmt.Errorf("demultiplexing container logs: %w", err)
}
// Write stdout logs
if err := os.WriteFile(stdoutPath, stdoutBuf.Bytes(), 0o644); err != nil {
return fmt.Errorf("failed to write stdout log: %w", err)
if err := os.WriteFile(stdoutPath, stdoutBuf.Bytes(), 0o644); err != nil { //nolint:gosec,noinlineerr // log files should be readable
return fmt.Errorf("writing stdout log: %w", err)
}
// Write stderr logs
if err := os.WriteFile(stderrPath, stderrBuf.Bytes(), 0o644); err != nil {
return fmt.Errorf("failed to write stderr log: %w", err)
if err := os.WriteFile(stderrPath, stderrBuf.Bytes(), 0o644); err != nil { //nolint:gosec,noinlineerr // log files should be readable
return fmt.Errorf("writing stderr log: %w", err)
}
if verbose {
@@ -633,63 +805,3 @@ func extractContainerFiles(ctx context.Context, cli *client.Client, containerID,
// This function is kept for potential future use or other file types
return nil
}
// logExtractionError logs extraction errors with appropriate level based on error type.
func logExtractionError(artifactType, containerName string, err error, verbose bool) {
if errors.Is(err, ErrFileNotFoundInTar) {
// File not found is expected and only logged in verbose mode
if verbose {
log.Printf("No %s found in container %s", artifactType, containerName)
}
} else {
// Other errors are actual failures and should be logged as warnings
log.Printf("Warning: failed to extract %s from %s: %v", artifactType, containerName, err)
}
}
// extractSingleFile copies a single file from a container.
func extractSingleFile(ctx context.Context, cli *client.Client, containerID, sourcePath, fileName, logsDir string, verbose bool) error {
tarReader, _, err := cli.CopyFromContainer(ctx, containerID, sourcePath)
if err != nil {
return fmt.Errorf("failed to copy %s from container: %w", sourcePath, err)
}
defer tarReader.Close()
// Extract the single file from the tar
filePath := filepath.Join(logsDir, fileName)
if err := extractFileFromTar(tarReader, filepath.Base(sourcePath), filePath); err != nil {
return fmt.Errorf("failed to extract file from tar: %w", err)
}
if verbose {
log.Printf("Extracted %s from %s", fileName, containerID[:12])
}
return nil
}
// extractDirectory copies a directory from a container and extracts its contents.
func extractDirectory(ctx context.Context, cli *client.Client, containerID, sourcePath, dirName, logsDir string, verbose bool) error {
tarReader, _, err := cli.CopyFromContainer(ctx, containerID, sourcePath)
if err != nil {
return fmt.Errorf("failed to copy %s from container: %w", sourcePath, err)
}
defer tarReader.Close()
// Create target directory
targetDir := filepath.Join(logsDir, dirName)
if err := os.MkdirAll(targetDir, 0o755); err != nil {
return fmt.Errorf("failed to create directory %s: %w", targetDir, err)
}
// Extract the directory from the tar
if err := extractDirectoryFromTar(tarReader, targetDir); err != nil {
return fmt.Errorf("failed to extract directory from tar: %w", err)
}
if verbose {
log.Printf("Extracted %s/ from %s", dirName, containerID[:12])
}
return nil
}

View File

@@ -38,13 +38,13 @@ func runDoctorCheck(ctx context.Context) error {
}
// Check 3: Go installation
results = append(results, checkGoInstallation())
results = append(results, checkGoInstallation(ctx))
// Check 4: Git repository
results = append(results, checkGitRepository())
results = append(results, checkGitRepository(ctx))
// Check 5: Required files
results = append(results, checkRequiredFiles())
results = append(results, checkRequiredFiles(ctx))
// Display results
displayDoctorResults(results)
@@ -86,7 +86,7 @@ func checkDockerBinary() DoctorResult {
// checkDockerDaemon verifies Docker daemon is running and accessible.
func checkDockerDaemon(ctx context.Context) DoctorResult {
cli, err := createDockerClient()
cli, err := createDockerClient(ctx)
if err != nil {
return DoctorResult{
Name: "Docker Daemon",
@@ -124,8 +124,8 @@ func checkDockerDaemon(ctx context.Context) DoctorResult {
}
// checkDockerContext verifies Docker context configuration.
func checkDockerContext(_ context.Context) DoctorResult {
contextInfo, err := getCurrentDockerContext()
func checkDockerContext(ctx context.Context) DoctorResult {
contextInfo, err := getCurrentDockerContext(ctx)
if err != nil {
return DoctorResult{
Name: "Docker Context",
@@ -155,7 +155,7 @@ func checkDockerContext(_ context.Context) DoctorResult {
// checkDockerSocket verifies Docker socket accessibility.
func checkDockerSocket(ctx context.Context) DoctorResult {
cli, err := createDockerClient()
cli, err := createDockerClient(ctx)
if err != nil {
return DoctorResult{
Name: "Docker Socket",
@@ -190,9 +190,9 @@ func checkDockerSocket(ctx context.Context) DoctorResult {
}
}
// checkGolangImage verifies we can access the golang Docker image.
// checkGolangImage verifies the golang Docker image is available locally or can be pulled.
func checkGolangImage(ctx context.Context) DoctorResult {
cli, err := createDockerClient()
cli, err := createDockerClient(ctx)
if err != nil {
return DoctorResult{
Name: "Golang Image",
@@ -205,17 +205,40 @@ func checkGolangImage(ctx context.Context) DoctorResult {
goVersion := detectGoVersion()
imageName := "golang:" + goVersion
// Check if we can pull the image
// First check if image is available locally
available, err := checkImageAvailableLocally(ctx, cli, imageName)
if err != nil {
return DoctorResult{
Name: "Golang Image",
Status: "FAIL",
Message: fmt.Sprintf("Cannot check golang image %s: %v", imageName, err),
Suggestions: []string{
"Check Docker daemon status",
"Try: docker images | grep golang",
},
}
}
if available {
return DoctorResult{
Name: "Golang Image",
Status: "PASS",
Message: fmt.Sprintf("Golang image %s is available locally", imageName),
}
}
// Image not available locally, try to pull it
err = ensureImageAvailable(ctx, cli, imageName, false)
if err != nil {
return DoctorResult{
Name: "Golang Image",
Status: "FAIL",
Message: fmt.Sprintf("Cannot pull golang image %s: %v", imageName, err),
Message: fmt.Sprintf("Golang image %s not available locally and cannot pull: %v", imageName, err),
Suggestions: []string{
"Check internet connectivity",
"Verify Docker Hub access",
"Try: docker pull " + imageName,
"Or run tests offline if image was pulled previously",
},
}
}
@@ -223,12 +246,12 @@ func checkGolangImage(ctx context.Context) DoctorResult {
return DoctorResult{
Name: "Golang Image",
Status: "PASS",
Message: fmt.Sprintf("Golang image %s is available", imageName),
Message: fmt.Sprintf("Golang image %s is now available", imageName),
}
}
// checkGoInstallation verifies Go is installed and working.
func checkGoInstallation() DoctorResult {
func checkGoInstallation(ctx context.Context) DoctorResult {
_, err := exec.LookPath("go")
if err != nil {
return DoctorResult{
@@ -242,7 +265,8 @@ func checkGoInstallation() DoctorResult {
}
}
cmd := exec.Command("go", "version")
cmd := exec.CommandContext(ctx, "go", "version")
output, err := cmd.Output()
if err != nil {
return DoctorResult{
@@ -262,8 +286,9 @@ func checkGoInstallation() DoctorResult {
}
// checkGitRepository verifies we're in a git repository.
func checkGitRepository() DoctorResult {
cmd := exec.Command("git", "rev-parse", "--git-dir")
func checkGitRepository(ctx context.Context) DoctorResult {
cmd := exec.CommandContext(ctx, "git", "rev-parse", "--git-dir")
err := cmd.Run()
if err != nil {
return DoctorResult{
@@ -285,7 +310,7 @@ func checkGitRepository() DoctorResult {
}
// checkRequiredFiles verifies required files exist.
func checkRequiredFiles() DoctorResult {
func checkRequiredFiles(ctx context.Context) DoctorResult {
requiredFiles := []string{
"go.mod",
"integration/",
@@ -293,9 +318,12 @@ func checkRequiredFiles() DoctorResult {
}
var missingFiles []string
for _, file := range requiredFiles {
cmd := exec.Command("test", "-e", file)
if err := cmd.Run(); err != nil {
cmd := exec.CommandContext(ctx, "test", "-e", file)
err := cmd.Run()
if err != nil {
missingFiles = append(missingFiles, file)
}
}
@@ -327,6 +355,7 @@ func displayDoctorResults(results []DoctorResult) {
for _, result := range results {
var icon string
switch result.Status {
case "PASS":
icon = "✅"

View File

@@ -79,13 +79,18 @@ func main() {
}
func cleanAll(ctx context.Context) error {
if err := killTestContainers(ctx); err != nil {
err := killTestContainers(ctx)
if err != nil {
return err
}
if err := pruneDockerNetworks(ctx); err != nil {
err = pruneDockerNetworks(ctx)
if err != nil {
return err
}
if err := cleanOldImages(ctx); err != nil {
err = cleanOldImages(ctx)
if err != nil {
return err
}

View File

@@ -19,11 +19,14 @@ type RunConfig struct {
FailFast bool `flag:"failfast,default=true,Stop on first test failure"`
UsePostgres bool `flag:"postgres,default=false,Use PostgreSQL instead of SQLite"`
GoVersion string `flag:"go-version,Go version to use (auto-detected from go.mod)"`
CleanBefore bool `flag:"clean-before,default=true,Clean resources before test"`
CleanBefore bool `flag:"clean-before,default=true,Clean stale resources before test"`
CleanAfter bool `flag:"clean-after,default=true,Clean resources after test"`
KeepOnFailure bool `flag:"keep-on-failure,default=false,Keep containers on test failure"`
LogsDir string `flag:"logs-dir,default=control_logs,Control logs directory"`
Verbose bool `flag:"verbose,default=false,Verbose output"`
Stats bool `flag:"stats,default=false,Collect and display container resource usage statistics"`
HSMemoryLimit float64 `flag:"hs-memory-limit,default=0,Fail test if any Headscale container exceeds this memory limit in MB (0 = disabled)"`
TSMemoryLimit float64 `flag:"ts-memory-limit,default=0,Fail test if any Tailscale container exceeds this memory limit in MB (0 = disabled)"`
}
// runIntegrationTest executes the integration test workflow.
@@ -45,7 +48,9 @@ func runIntegrationTest(env *command.Env) error {
if runConfig.Verbose {
log.Printf("Running pre-flight system checks...")
}
if err := runDoctorCheck(env.Context()); err != nil {
err := runDoctorCheck(env.Context())
if err != nil {
return fmt.Errorf("pre-flight checks failed: %w", err)
}
@@ -63,15 +68,15 @@ func runIntegrationTest(env *command.Env) error {
func detectGoVersion() string {
goModPath := filepath.Join("..", "..", "go.mod")
if _, err := os.Stat("go.mod"); err == nil {
if _, err := os.Stat("go.mod"); err == nil { //nolint:noinlineerr
goModPath = "go.mod"
} else if _, err := os.Stat("../../go.mod"); err == nil {
} else if _, err := os.Stat("../../go.mod"); err == nil { //nolint:noinlineerr
goModPath = "../../go.mod"
}
content, err := os.ReadFile(goModPath)
if err != nil {
return "1.24"
return "1.26.1"
}
lines := splitLines(string(content))
@@ -86,13 +91,15 @@ func detectGoVersion() string {
}
}
return "1.24"
return "1.26.1"
}
// splitLines splits a string into lines without using strings.Split.
func splitLines(s string) []string {
var lines []string
var current string
var (
lines []string
current string
)
for _, char := range s {
if char == '\n' {

493
cmd/hi/stats.go Normal file
View File

@@ -0,0 +1,493 @@
package main
import (
"context"
"encoding/json"
"errors"
"fmt"
"log"
"sort"
"strings"
"sync"
"time"
"github.com/docker/docker/api/types"
"github.com/docker/docker/api/types/container"
"github.com/docker/docker/api/types/events"
"github.com/docker/docker/api/types/filters"
"github.com/docker/docker/client"
)
// ErrStatsCollectionAlreadyStarted is returned when trying to start stats collection that is already running.
var ErrStatsCollectionAlreadyStarted = errors.New("stats collection already started")
// ContainerStats represents statistics for a single container.
type ContainerStats struct {
ContainerID string
ContainerName string
Stats []StatsSample
mutex sync.RWMutex
}
// StatsSample represents a single stats measurement.
type StatsSample struct {
Timestamp time.Time
CPUUsage float64 // CPU usage percentage
MemoryMB float64 // Memory usage in MB
}
// StatsCollector manages collection of container statistics.
type StatsCollector struct {
client *client.Client
containers map[string]*ContainerStats
stopChan chan struct{}
wg sync.WaitGroup
mutex sync.RWMutex
collectionStarted bool
}
// NewStatsCollector creates a new stats collector instance.
func NewStatsCollector(ctx context.Context) (*StatsCollector, error) {
cli, err := createDockerClient(ctx)
if err != nil {
return nil, fmt.Errorf("creating Docker client: %w", err)
}
return &StatsCollector{
client: cli,
containers: make(map[string]*ContainerStats),
stopChan: make(chan struct{}),
}, nil
}
// StartCollection begins monitoring all containers and collecting stats for hs- and ts- containers with matching run ID.
func (sc *StatsCollector) StartCollection(ctx context.Context, runID string, verbose bool) error {
sc.mutex.Lock()
defer sc.mutex.Unlock()
if sc.collectionStarted {
return ErrStatsCollectionAlreadyStarted
}
sc.collectionStarted = true
// Start monitoring existing containers
sc.wg.Add(1)
go sc.monitorExistingContainers(ctx, runID, verbose)
// Start Docker events monitoring for new containers
sc.wg.Add(1)
go sc.monitorDockerEvents(ctx, runID, verbose)
if verbose {
log.Printf("Started container monitoring for run ID %s", runID)
}
return nil
}
// StopCollection stops all stats collection.
func (sc *StatsCollector) StopCollection() {
// Check if already stopped without holding lock
sc.mutex.RLock()
if !sc.collectionStarted {
sc.mutex.RUnlock()
return
}
sc.mutex.RUnlock()
// Signal stop to all goroutines
close(sc.stopChan)
// Wait for all goroutines to finish
sc.wg.Wait()
// Mark as stopped
sc.mutex.Lock()
sc.collectionStarted = false
sc.mutex.Unlock()
}
// monitorExistingContainers checks for existing containers that match our criteria.
func (sc *StatsCollector) monitorExistingContainers(ctx context.Context, runID string, verbose bool) {
defer sc.wg.Done()
containers, err := sc.client.ContainerList(ctx, container.ListOptions{})
if err != nil {
if verbose {
log.Printf("Failed to list existing containers: %v", err)
}
return
}
for _, cont := range containers {
if sc.shouldMonitorContainer(cont, runID) {
sc.startStatsForContainer(ctx, cont.ID, cont.Names[0], verbose)
}
}
}
// monitorDockerEvents listens for container start events and begins monitoring relevant containers.
func (sc *StatsCollector) monitorDockerEvents(ctx context.Context, runID string, verbose bool) {
defer sc.wg.Done()
filter := filters.NewArgs()
filter.Add("type", "container")
filter.Add("event", "start")
eventOptions := events.ListOptions{
Filters: filter,
}
events, errs := sc.client.Events(ctx, eventOptions)
for {
select {
case <-sc.stopChan:
return
case <-ctx.Done():
return
case event := <-events:
if event.Type == "container" && event.Action == "start" {
// Get container details
containerInfo, err := sc.client.ContainerInspect(ctx, event.ID) //nolint:staticcheck // SA1019: use Actor.ID
if err != nil {
continue
}
// Convert to types.Container format for consistency
cont := types.Container{ //nolint:staticcheck // SA1019: use container.Summary
ID: containerInfo.ID,
Names: []string{containerInfo.Name},
Labels: containerInfo.Config.Labels,
}
if sc.shouldMonitorContainer(cont, runID) {
sc.startStatsForContainer(ctx, cont.ID, cont.Names[0], verbose)
}
}
case err := <-errs:
if verbose {
log.Printf("Error in Docker events stream: %v", err)
}
return
}
}
}
// shouldMonitorContainer determines if a container should be monitored.
func (sc *StatsCollector) shouldMonitorContainer(cont types.Container, runID string) bool { //nolint:staticcheck // SA1019: use container.Summary
// Check if it has the correct run ID label
if cont.Labels == nil || cont.Labels["hi.run-id"] != runID {
return false
}
// Check if it's an hs- or ts- container
for _, name := range cont.Names {
containerName := strings.TrimPrefix(name, "/")
if strings.HasPrefix(containerName, "hs-") || strings.HasPrefix(containerName, "ts-") {
return true
}
}
return false
}
// startStatsForContainer begins stats collection for a specific container.
func (sc *StatsCollector) startStatsForContainer(ctx context.Context, containerID, containerName string, verbose bool) {
containerName = strings.TrimPrefix(containerName, "/")
sc.mutex.Lock()
// Check if we're already monitoring this container
if _, exists := sc.containers[containerID]; exists {
sc.mutex.Unlock()
return
}
sc.containers[containerID] = &ContainerStats{
ContainerID: containerID,
ContainerName: containerName,
Stats: make([]StatsSample, 0),
}
sc.mutex.Unlock()
if verbose {
log.Printf("Starting stats collection for container %s (%s)", containerName, containerID[:12])
}
sc.wg.Add(1)
go sc.collectStatsForContainer(ctx, containerID, verbose)
}
// collectStatsForContainer collects stats for a specific container using Docker API streaming.
func (sc *StatsCollector) collectStatsForContainer(ctx context.Context, containerID string, verbose bool) {
defer sc.wg.Done()
// Use Docker API streaming stats - much more efficient than CLI
statsResponse, err := sc.client.ContainerStats(ctx, containerID, true)
if err != nil {
if verbose {
log.Printf("Failed to get stats stream for container %s: %v", containerID[:12], err)
}
return
}
defer statsResponse.Body.Close()
decoder := json.NewDecoder(statsResponse.Body)
var prevStats *container.Stats //nolint:staticcheck // SA1019: use StatsResponse
for {
select {
case <-sc.stopChan:
return
case <-ctx.Done():
return
default:
var stats container.Stats //nolint:staticcheck // SA1019: use StatsResponse
err := decoder.Decode(&stats)
if err != nil {
// EOF is expected when container stops or stream ends
if err.Error() != "EOF" && verbose {
log.Printf("Failed to decode stats for container %s: %v", containerID[:12], err)
}
return
}
// Calculate CPU percentage (only if we have previous stats)
var cpuPercent float64
if prevStats != nil {
cpuPercent = calculateCPUPercent(prevStats, &stats)
}
// Calculate memory usage in MB
memoryMB := float64(stats.MemoryStats.Usage) / (1024 * 1024)
// Store the sample (skip first sample since CPU calculation needs previous stats)
if prevStats != nil {
// Get container stats reference without holding the main mutex
var (
containerStats *ContainerStats
exists bool
)
sc.mutex.RLock()
containerStats, exists = sc.containers[containerID]
sc.mutex.RUnlock()
if exists && containerStats != nil {
containerStats.mutex.Lock()
containerStats.Stats = append(containerStats.Stats, StatsSample{
Timestamp: time.Now(),
CPUUsage: cpuPercent,
MemoryMB: memoryMB,
})
containerStats.mutex.Unlock()
}
}
// Save current stats for next iteration
prevStats = &stats
}
}
}
// calculateCPUPercent calculates CPU usage percentage from Docker stats.
func calculateCPUPercent(prevStats, stats *container.Stats) float64 { //nolint:staticcheck // SA1019: use StatsResponse
// CPU calculation based on Docker's implementation
cpuDelta := float64(stats.CPUStats.CPUUsage.TotalUsage) - float64(prevStats.CPUStats.CPUUsage.TotalUsage)
systemDelta := float64(stats.CPUStats.SystemUsage) - float64(prevStats.CPUStats.SystemUsage)
if systemDelta > 0 && cpuDelta >= 0 {
// Calculate CPU percentage: (container CPU delta / system CPU delta) * number of CPUs * 100
numCPUs := float64(len(stats.CPUStats.CPUUsage.PercpuUsage))
if numCPUs == 0 {
// Fallback: if PercpuUsage is not available, assume 1 CPU
numCPUs = 1.0
}
return (cpuDelta / systemDelta) * numCPUs * 100.0
}
return 0.0
}
// ContainerStatsSummary represents summary statistics for a container.
type ContainerStatsSummary struct {
ContainerName string
SampleCount int
CPU StatsSummary
Memory StatsSummary
}
// MemoryViolation represents a container that exceeded the memory limit.
type MemoryViolation struct {
ContainerName string
MaxMemoryMB float64
LimitMB float64
}
// StatsSummary represents min, max, and average for a metric.
type StatsSummary struct {
Min float64
Max float64
Average float64
}
// GetSummary returns a summary of collected statistics.
func (sc *StatsCollector) GetSummary() []ContainerStatsSummary {
// Take snapshot of container references without holding main lock long
sc.mutex.RLock()
containerRefs := make([]*ContainerStats, 0, len(sc.containers))
for _, containerStats := range sc.containers {
containerRefs = append(containerRefs, containerStats)
}
sc.mutex.RUnlock()
summaries := make([]ContainerStatsSummary, 0, len(containerRefs))
for _, containerStats := range containerRefs {
containerStats.mutex.RLock()
stats := make([]StatsSample, len(containerStats.Stats))
copy(stats, containerStats.Stats)
containerName := containerStats.ContainerName
containerStats.mutex.RUnlock()
if len(stats) == 0 {
continue
}
summary := ContainerStatsSummary{
ContainerName: containerName,
SampleCount: len(stats),
}
// Calculate CPU stats
cpuValues := make([]float64, len(stats))
memoryValues := make([]float64, len(stats))
for i, sample := range stats {
cpuValues[i] = sample.CPUUsage
memoryValues[i] = sample.MemoryMB
}
summary.CPU = calculateStatsSummary(cpuValues)
summary.Memory = calculateStatsSummary(memoryValues)
summaries = append(summaries, summary)
}
// Sort by container name for consistent output
sort.Slice(summaries, func(i, j int) bool {
return summaries[i].ContainerName < summaries[j].ContainerName
})
return summaries
}
// calculateStatsSummary calculates min, max, and average for a slice of values.
func calculateStatsSummary(values []float64) StatsSummary {
if len(values) == 0 {
return StatsSummary{}
}
minVal := values[0]
maxVal := values[0]
sum := 0.0
for _, value := range values {
if value < minVal {
minVal = value
}
if value > maxVal {
maxVal = value
}
sum += value
}
return StatsSummary{
Min: minVal,
Max: maxVal,
Average: sum / float64(len(values)),
}
}
// PrintSummary prints the statistics summary to the console.
func (sc *StatsCollector) PrintSummary() {
summaries := sc.GetSummary()
if len(summaries) == 0 {
log.Printf("No container statistics collected")
return
}
log.Printf("Container Resource Usage Summary:")
log.Printf("================================")
for _, summary := range summaries {
log.Printf("Container: %s (%d samples)", summary.ContainerName, summary.SampleCount)
log.Printf(" CPU Usage: Min: %6.2f%% Max: %6.2f%% Avg: %6.2f%%",
summary.CPU.Min, summary.CPU.Max, summary.CPU.Average)
log.Printf(" Memory Usage: Min: %6.1f MB Max: %6.1f MB Avg: %6.1f MB",
summary.Memory.Min, summary.Memory.Max, summary.Memory.Average)
log.Printf("")
}
}
// CheckMemoryLimits checks if any containers exceeded their memory limits.
func (sc *StatsCollector) CheckMemoryLimits(hsLimitMB, tsLimitMB float64) []MemoryViolation {
if hsLimitMB <= 0 && tsLimitMB <= 0 {
return nil
}
summaries := sc.GetSummary()
var violations []MemoryViolation
for _, summary := range summaries {
var limitMB float64
if strings.HasPrefix(summary.ContainerName, "hs-") {
limitMB = hsLimitMB
} else if strings.HasPrefix(summary.ContainerName, "ts-") {
limitMB = tsLimitMB
} else {
continue // Skip containers that don't match our patterns
}
if limitMB > 0 && summary.Memory.Max > limitMB {
violations = append(violations, MemoryViolation{
ContainerName: summary.ContainerName,
MaxMemoryMB: summary.Memory.Max,
LimitMB: limitMB,
})
}
}
return violations
}
// PrintSummaryAndCheckLimits prints the statistics summary and returns memory violations if any.
func (sc *StatsCollector) PrintSummaryAndCheckLimits(hsLimitMB, tsLimitMB float64) []MemoryViolation {
sc.PrintSummary()
return sc.CheckMemoryLimits(hsLimitMB, tsLimitMB)
}
// Close closes the stats collector and cleans up resources.
func (sc *StatsCollector) Close() error {
sc.StopCollection()
return sc.client.Close()
}

View File

@@ -1,100 +0,0 @@
package main
import (
"archive/tar"
"errors"
"fmt"
"io"
"os"
"path/filepath"
"strings"
)
// ErrFileNotFoundInTar indicates a file was not found in the tar archive.
var ErrFileNotFoundInTar = errors.New("file not found in tar")
// extractFileFromTar extracts a single file from a tar reader.
func extractFileFromTar(tarReader io.Reader, fileName, outputPath string) error {
tr := tar.NewReader(tarReader)
for {
header, err := tr.Next()
if err == io.EOF {
break
}
if err != nil {
return fmt.Errorf("failed to read tar header: %w", err)
}
// Check if this is the file we're looking for
if filepath.Base(header.Name) == fileName {
if header.Typeflag == tar.TypeReg {
// Create the output file
outFile, err := os.Create(outputPath)
if err != nil {
return fmt.Errorf("failed to create output file: %w", err)
}
defer outFile.Close()
// Copy file contents
if _, err := io.Copy(outFile, tr); err != nil {
return fmt.Errorf("failed to copy file contents: %w", err)
}
return nil
}
}
}
return fmt.Errorf("%w: %s", ErrFileNotFoundInTar, fileName)
}
// extractDirectoryFromTar extracts all files from a tar reader to a target directory.
func extractDirectoryFromTar(tarReader io.Reader, targetDir string) error {
tr := tar.NewReader(tarReader)
for {
header, err := tr.Next()
if err == io.EOF {
break
}
if err != nil {
return fmt.Errorf("failed to read tar header: %w", err)
}
// Clean the path to prevent directory traversal
cleanName := filepath.Clean(header.Name)
if strings.Contains(cleanName, "..") {
continue // Skip potentially dangerous paths
}
targetPath := filepath.Join(targetDir, filepath.Base(cleanName))
switch header.Typeflag {
case tar.TypeDir:
// Create directory
if err := os.MkdirAll(targetPath, os.FileMode(header.Mode)); err != nil {
return fmt.Errorf("failed to create directory %s: %w", targetPath, err)
}
case tar.TypeReg:
// Create file
outFile, err := os.Create(targetPath)
if err != nil {
return fmt.Errorf("failed to create file %s: %w", targetPath, err)
}
if _, err := io.Copy(outFile, tr); err != nil {
outFile.Close()
return fmt.Errorf("failed to copy file contents: %w", err)
}
outFile.Close()
// Set file permissions
if err := os.Chmod(targetPath, os.FileMode(header.Mode)); err != nil {
return fmt.Errorf("failed to set file permissions: %w", err)
}
}
}
return nil
}

66
cmd/mapresponses/main.go Normal file
View File

@@ -0,0 +1,66 @@
package main
import (
"encoding/json"
"errors"
"fmt"
"os"
"github.com/creachadair/command"
"github.com/creachadair/flax"
"github.com/juanfont/headscale/hscontrol/mapper"
"github.com/juanfont/headscale/integration/integrationutil"
)
type MapConfig struct {
Directory string `flag:"directory,Directory to read map responses from"`
}
var (
mapConfig MapConfig
errDirectoryRequired = errors.New("directory is required")
)
func main() {
root := command.C{
Name: "mapresponses",
Help: "MapResponses is a tool to map and compare map responses from a directory",
Commands: []*command.C{
{
Name: "online",
Help: "",
Usage: "run [test-pattern] [flags]",
SetFlags: command.Flags(flax.MustBind, &mapConfig),
Run: runOnline,
},
command.HelpCommand(nil),
},
}
env := root.NewEnv(nil).MergeFlags(true)
command.RunOrFail(env, os.Args[1:])
}
// runIntegrationTest executes the integration test workflow.
func runOnline(env *command.Env) error {
if mapConfig.Directory == "" {
return errDirectoryRequired
}
resps, err := mapper.ReadMapResponsesFromDirectory(mapConfig.Directory)
if err != nil {
return fmt.Errorf("reading map responses from directory: %w", err)
}
expected := integrationutil.BuildExpectedOnlineMap(resps)
out, err := json.MarshalIndent(expected, "", " ")
if err != nil {
return fmt.Errorf("marshaling expected online map: %w", err)
}
os.Stderr.Write(out)
os.Stderr.Write([]byte("\n"))
return nil
}

View File

@@ -20,6 +20,7 @@ listen_addr: 127.0.0.1:8080
# Address to listen to /metrics and /debug, you may want
# to keep this endpoint private to your internal network
# Use an emty value to disable the metrics listener.
metrics_listen_addr: 127.0.0.1:9090
# Address to listen for gRPC.
@@ -49,18 +50,29 @@ noise:
# List of IP prefixes to allocate tailaddresses from.
# Each prefix consists of either an IPv4 or IPv6 address,
# and the associated prefix length, delimited by a slash.
# It must be within IP ranges supported by the Tailscale
# client - i.e., subnets of 100.64.0.0/10 and fd7a:115c:a1e0::/48.
# See below:
# IPv6: https://github.com/tailscale/tailscale/blob/22ebb25e833264f58d7c3f534a8b166894a89536/net/tsaddr/tsaddr.go#LL81C52-L81C71
#
# WARNING: These prefixes MUST be subsets of the standard Tailscale ranges:
# - IPv4: 100.64.0.0/10 (CGNAT range)
# - IPv6: fd7a:115c:a1e0::/48 (Tailscale ULA range)
#
# Using a SUBSET of these ranges is supported and useful if you want to
# limit IP allocation to a smaller block (e.g., 100.64.0.0/24).
#
# Using ranges OUTSIDE of CGNAT/ULA is NOT supported and will cause
# undefined behaviour. The Tailscale client has hard-coded assumptions
# about these ranges and will break in subtle, hard-to-debug ways.
#
# See:
# IPv4: https://github.com/tailscale/tailscale/blob/22ebb25e833264f58d7c3f534a8b166894a89536/net/tsaddr/tsaddr.go#L33
# Any other range is NOT supported, and it will cause unexpected issues.
# IPv6: https://github.com/tailscale/tailscale/blob/22ebb25e833264f58d7c3f534a8b166894a89536/net/tsaddr/tsaddr.go#LL81C52-L81C71
prefixes:
v4: 100.64.0.0/10
v6: fd7a:115c:a1e0::/48
# Strategy used for allocation of IPs to nodes, available options:
# - sequential (default): assigns the next free IP from the previous given IP.
# - sequential (default): assigns the next free IP from the previous given
# IP. A best-effort approach is used and Headscale might leave holes in the
# IP range or fill up existing holes in the IP range.
# - random: assigns the next free IP from a pseudo-random IP generator (crypto/rand).
allocation: sequential
@@ -105,7 +117,7 @@ derp:
# For better connection stability (especially when using an Exit-Node and DNS is not working),
# it is possible to optionally add the public IPv4 and IPv6 address to the Derp-Map using:
ipv4: 1.2.3.4
ipv4: 198.51.100.1
ipv6: 2001:db8::1
# List of externally available DERP maps encoded in JSON
@@ -128,7 +140,7 @@ derp:
auto_update_enabled: true
# How often should we check for DERP updates?
update_frequency: 24h
update_frequency: 3h
# Disables the automatic check for headscale updates on startup
disable_check_updates: false
@@ -225,9 +237,11 @@ tls_cert_path: ""
tls_key_path: ""
log:
# Valid log levels: panic, fatal, error, warn, info, debug, trace
level: info
# Output formatting for logs: text or json
format: text
level: info
## Policy
# headscale supports Tailscale's ACL policies.
@@ -273,9 +287,9 @@ dns:
# `hostname.base_domain` (e.g., _myhost.example.com_).
base_domain: example.com
# Whether to use the local DNS settings of a node (default) or override the
# local DNS settings and force the use of Headscale's DNS configuration.
override_local_dns: false
# Whether to use the local DNS settings of a node or override the local DNS
# settings (default) and force the use of Headscale's DNS configuration.
override_local_dns: true
# List of DNS servers to expose to clients.
nameservers:
@@ -291,8 +305,7 @@ dns:
# Split DNS (see https://tailscale.com/kb/1054/dns/),
# a map of domains and which DNS server to use for each.
split:
{}
split: {}
# foo.bar.com:
# - 1.1.1.1
# darp.headscale.net:
@@ -358,6 +371,12 @@ unix_socket_permission: "0770"
# # required "openid" scope.
# scope: ["openid", "profile", "email"]
#
# # Only verified email addresses are synchronized to the user profile by
# # default. Unverified emails may be allowed in case an identity provider
# # does not send the "email_verified: true" claim or email verification is
# # not required.
# email_verified_required: true
#
# # Provide custom key/value pairs which get sent to the identity provider's
# # authorization endpoint.
# extra_params:
@@ -390,11 +409,13 @@ unix_socket_permission: "0770"
# method: S256
# Logtail configuration
# Logtail is Tailscales logging and auditing infrastructure, it allows the control panel
# to instruct tailscale nodes to log their activity to a remote server.
# Logtail is Tailscales logging and auditing infrastructure, it allows the
# control panel to instruct tailscale nodes to log their activity to a remote
# server. To disable logging on the client side, please refer to:
# https://tailscale.com/kb/1011/log-mesh-traffic#opting-out-of-client-logging
logtail:
# Enable logtail for this headscales clients.
# As there is currently no support for overriding the log server in headscale, this is
# Enable logtail for tailscale nodes of this Headscale instance.
# As there is currently no support for overriding the log server in Headscale, this is
# disabled by default. Enabling this will make your clients send logs to Tailscale Inc.
enabled: false
@@ -402,3 +423,24 @@ logtail:
# default static port 41641. This option is intended as a workaround for some buggy
# firewall devices. See https://tailscale.com/kb/1181/firewalls/ for more information.
randomize_client_port: false
# Taildrop configuration
# Taildrop is the file sharing feature of Tailscale, allowing nodes to send files to each other.
# https://tailscale.com/kb/1106/taildrop/
taildrop:
# Enable or disable Taildrop for all nodes.
# When enabled, nodes can send files to other nodes owned by the same user.
# Tagged devices and cross-user transfers are not permitted by Tailscale clients.
enabled: true
# Advanced performance tuning parameters.
# The defaults are carefully chosen and should rarely need adjustment.
# Only modify these if you have identified a specific performance issue.
#
# tuning:
# # NodeStore write batching configuration.
# # The NodeStore batches write operations before rebuilding peer relationships,
# # which is computationally expensive. Batching reduces rebuild frequency.
# #
# # node_store_batch_size: 100
# # node_store_batch_timeout: 500ms

View File

@@ -1,5 +1,6 @@
# If you plan to somehow use headscale, please deploy your own DERP infra: https://tailscale.com/kb/1118/custom-derp-servers/
regions:
1: null # Disable DERP region with ID 1
900:
regionid: 900
regioncode: custom
@@ -7,9 +8,9 @@ regions:
nodes:
- name: 900a
regionid: 900
hostname: myderp.mydomain.no
ipv4: 123.123.123.123
ipv6: "2604:a880:400:d1::828:b001"
hostname: myderp.example.com
ipv4: 198.51.100.1
ipv6: 2001:db8::1
stunport: 0
stunonly: false
derpport: 0

View File

@@ -1,3 +1,3 @@
{%
include-markdown "../../CONTRIBUTING.md"
include-markdown "../../CONTRIBUTING.md"
%}

View File

@@ -24,9 +24,12 @@ We are more than happy to exchange emails, or to have dedicated calls before a P
## When/Why is Feature X going to be implemented?
We don't know. We might be working on it. If you're interested in contributing, please post a feature request about it.
We use [GitHub Milestones to plan for upcoming Headscale releases](https://github.com/juanfont/headscale/milestones).
Have a look at [our current plan](https://github.com/juanfont/headscale/milestones) to get an idea when a specific
feature is about to be implemented. The release plan is subject to change at any time.
Please be aware that there are a number of reasons why we might not accept specific contributions:
If you're interested in contributing, please post a feature request about it. Please be aware that there are a number of
reasons why we might not accept specific contributions:
- It is not possible to implement the feature in a way that makes sense in a self-hosted environment.
- Given that we are reverse-engineering Tailscale to satisfy our own curiosity, we might be interested in implementing the feature ourselves.
@@ -44,6 +47,15 @@ For convenience, we also [build container images with headscale](../setup/instal
we don't officially support deploying headscale using Docker**. On our [Discord server](https://discord.gg/c84AZQhmpx)
we have a "docker-issues" channel where you can ask for Docker-specific help to the community.
## What is the recommended update path? Can I skip multiple versions while updating?
Please follow the steps outlined in the [upgrade guide](../setup/upgrade.md) to update your existing Headscale
installation. Its required to update from one stable version to the next (e.g. 0.26.0 → 0.27.1 → 0.28.0) without
skipping minor versions in between. You should always pick the latest available patch release.
Be sure to check the [changelog](https://github.com/juanfont/headscale/blob/main/CHANGELOG.md) for version specific
upgrade instructions and breaking changes.
## Scaling / How many clients does Headscale support?
It depends. As often stated, Headscale is not enterprise software and our focus
@@ -51,22 +63,22 @@ is homelabbers and self-hosters. Of course, we do not prevent people from using
it in a commercial/professional setting and often get questions about scaling.
Please note that when Headscale is developed, performance is not part of the
consideration as the main audience is considered to be users with a moddest
consideration as the main audience is considered to be users with a modest
amount of devices. We focus on correctness and feature parity with Tailscale
SaaS over time.
To understand if you might be able to use Headscale for your usecase, I will
To understand if you might be able to use Headscale for your use case, I will
describe two scenarios in an effort to explain what is the central bottleneck
of Headscale:
1. An environment with 1000 servers
- they rarely "move" (change their endpoints)
- new nodes are added rarely
- they rarely "move" (change their endpoints)
- new nodes are added rarely
2. An environment with 80 laptops/phones (end user devices)
1. An environment with 80 laptops/phones (end user devices)
- nodes move often, e.g. switching from home to office
- nodes move often, e.g. switching from home to office
Headscale calculates a map of all nodes that need to talk to each other,
creating this "world map" requires a lot of CPU time. When an event that
@@ -76,7 +88,7 @@ new "world map" is created for every node in the network.
This means that under certain conditions, Headscale can likely handle 100s
of devices (maybe more), if there is _little to no change_ happening in the
network. For example, in Scenario 1, the process of computing the world map is
extremly demanding due to the size of the network, but when the map has been
extremely demanding due to the size of the network, but when the map has been
created and the nodes are not changing, the Headscale instance will likely
return to a very low resource usage until the next time there is an event
requiring the new map.
@@ -94,14 +106,14 @@ learn about the current state of the world.
We expect that the performance will improve over time as we improve the code
base, but it is not a focus. In general, we will never make the tradeoff to make
things faster on the cost of less maintainable or readable code. We are a small
team and have to optimise for maintainabillity.
team and have to optimise for maintainability.
## Which database should I use?
We recommend the use of SQLite as database for headscale:
- SQLite is simple to setup and easy to use
- It scales well for all of headscale's usecases
- It scales well for all of headscale's use cases
- Development and testing happens primarily on SQLite
- PostgreSQL is still supported, but is considered to be in "maintenance mode"
@@ -130,7 +142,72 @@ connect back to the administrator's node. Why do all nodes see the administrator
`tailscale status`?
This is essentially how Tailscale works. If traffic is allowed to flow in one direction, then both nodes see each other
in their output of `tailscale status`. Traffic is still filtered according to the ACL, with the exception of `tailscale
ping` which is always allowed in either direction.
in their output of `tailscale status`. Traffic is still filtered according to the ACL, with the exception of
`tailscale ping` which is always allowed in either direction.
See also <https://tailscale.com/kb/1087/device-visibility>.
## My policy is stored in the database and Headscale refuses to start due to an invalid policy. How can I recover?
Headscale checks if the policy is valid during startup and refuses to start if it detects an error. The error message
indicates which part of the policy is invalid. Follow these steps to fix your policy:
- Dump the policy to a file: `headscale policy get --bypass-grpc-and-access-database-directly > policy.json`
- Edit and fixup `policy.json`. Use the command `headscale policy check --file policy.json` to validate the policy.
- Load the modified policy: `headscale policy set --bypass-grpc-and-access-database-directly --file policy.json`
- Start Headscale as usual.
!!! warning "Full server configuration required"
The above commands to get/set the policy require a complete server configuration file including database settings. A
minimal config to [control Headscale via remote CLI](../ref/api.md#grpc) is not sufficient. You may use
`headscale -c /path/to/config.yaml` to specify the path to an alternative configuration file.
## How can I migrate back to the recommended IP prefixes?
Tailscale only supports the IP prefixes `100.64.0.0/10` and `fd7a:115c:a1e0::/48` or smaller subnets thereof. The
following steps can be used to migrate from unsupported IP prefixes back to the supported and recommended ones.
!!! warning "Backup and test in a demo environment required"
The commands below update the IP addresses of all nodes in your tailnet and this might have a severe impact in your
specific environment. At a minimum:
- [Create a backup of your database](../setup/upgrade.md#backup)
- Test the commands below in a representive demo environment. This allows to catch subsequent connectivity errors
early and see how the tailnet behaves in your specific environment.
- Stop Headscale
- Restore the default prefixes in the [configuration file](../ref/configuration.md):
```yaml
prefixes:
v4: 100.64.0.0/10
v6: fd7a:115c:a1e0::/48
```
- Update the `nodes.ipv4` and `nodes.ipv6` columns in the database and assign each node a unique IPv4 and IPv6 address.
The following SQL statement assigns IP addresses based on the node ID:
```sql
UPDATE nodes
SET ipv4=concat('100.64.', id/256, '.', id%256),
ipv6=concat('fd7a:115c:a1e0::', format('%x', id));
```
- Update the [policy](../ref/acls.md) to reflect the IP address changes (if any)
- Start Headscale
Nodes should reconnect within a few seconds and pickup their newly assigned IP addresses.
## How can I avoid to send logs to Tailscale Inc?
A Tailscale client [collects logs about its operation and connection attempts with other
clients](https://tailscale.com/kb/1011/log-mesh-traffic#client-logs) and sends them to a central log service operated by
Tailscale Inc.
Headscale, by default, instructs clients to disable log submission to the central log service. This configuration is
applied by a client once it successfully connected with Headscale. See the configuration option `logtail.enabled` in the
[configuration file](../ref/configuration.md) for details.
Alternatively, logging can also be disabled on the client side. This is independent of Headscale and opting out of
client logging disables log submission early during client startup. The configuration is operating system specific and
is usually achieved by setting the environment variable `TS_NO_LOGS_NO_SUPPORT=true` or by passing the flag
`--no-logs-no-support` to `tailscaled`. See
<https://tailscale.com/kb/1011/log-mesh-traffic#opting-out-of-client-logging> for details.

View File

@@ -5,30 +5,31 @@ to provide self-hosters and hobbyists with an open-source server they can use fo
provides on overview of Headscale's feature and compatibility with the Tailscale control server:
- [x] Full "base" support of Tailscale's features
- [x] Node registration
- [x] Interactive
- [x] Pre authenticated key
- [x] [Node registration](../ref/registration.md)
- [x] [Web authentication](../ref/registration.md#web-authentication)
- [x] [Pre authenticated key](../ref/registration.md#pre-authenticated-key)
- [x] [DNS](../ref/dns.md)
- [x] [MagicDNS](https://tailscale.com/kb/1081/magicdns)
- [x] [Global and restricted nameservers (split DNS)](https://tailscale.com/kb/1054/dns#nameservers)
- [x] [search domains](https://tailscale.com/kb/1054/dns#search-domains)
- [x] [Extra DNS records (Headscale only)](../ref/dns.md#setting-extra-dns-records)
- [x] [Taildrop (File Sharing)](https://tailscale.com/kb/1106/taildrop)
- [x] [Tags](../ref/tags.md)
- [x] [Routes](../ref/routes.md)
- [x] [Subnet routers](../ref/routes.md#subnet-router)
- [x] [Exit nodes](../ref/routes.md#exit-node)
- [x] Dual stack (IPv4 and IPv6)
- [x] Ephemeral nodes
- [x] Embedded [DERP server](https://tailscale.com/kb/1232/derp-servers)
- [x] Embedded [DERP server](../ref/derp.md)
- [x] Access control lists ([GitHub label "policy"](https://github.com/juanfont/headscale/labels/policy%20%F0%9F%93%9D))
- [x] ACL management via API
- [x] Some [Autogroups](https://tailscale.com/kb/1396/targets#autogroups), currently: `autogroup:internet`,
`autogroup:nonroot`, `autogroup:member`, `autogroup:tagged`
`autogroup:nonroot`, `autogroup:member`, `autogroup:tagged`, `autogroup:self`
- [x] [Auto approvers](https://tailscale.com/kb/1337/acl-syntax#auto-approvers) for [subnet
routers](../ref/routes.md#automatically-approve-routes-of-a-subnet-router) and [exit
nodes](../ref/routes.md#automatically-approve-an-exit-node-with-auto-approvers)
- [x] [Tailscale SSH](https://tailscale.com/kb/1193/tailscale-ssh)
* [x] [Node registration using Single-Sign-On (OpenID Connect)](../ref/oidc.md) ([GitHub label "OIDC"](https://github.com/juanfont/headscale/labels/OIDC))
- [x] [Node registration using Single-Sign-On (OpenID Connect)](../ref/oidc.md) ([GitHub label "OIDC"](https://github.com/juanfont/headscale/labels/OIDC))
- [x] Basic registration
- [x] Update user profile from identity provider
- [ ] OIDC groups cannot be used in ACLs

BIN
docs/assets/favicon.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

View File

Before

Width:  |  Height:  |  Size: 56 KiB

After

Width:  |  Height:  |  Size: 56 KiB

View File

Before

Width:  |  Height:  |  Size: 34 KiB

After

Width:  |  Height:  |  Size: 34 KiB

View File

@@ -1 +1 @@
<svg xmlns="http://www.w3.org/2000/svg" xml:space="preserve" style="fill-rule:evenodd;clip-rule:evenodd;stroke-linejoin:round;stroke-miterlimit:2" viewBox="0 0 1280 640"><circle cx="141.023" cy="338.36" r="117.472" style="fill:#f8b5cb" transform="matrix(.997276 0 0 1.00556 10.0024 -14.823)"/><circle cx="352.014" cy="268.302" r="33.095" style="fill:#a2a2a2" transform="matrix(1.01749 0 0 1 -3.15847 0)"/><circle cx="352.014" cy="268.302" r="33.095" style="fill:#a2a2a2" transform="matrix(1.01749 0 0 1 -3.15847 115.914)"/><circle cx="352.014" cy="268.302" r="33.095" style="fill:#a2a2a2" transform="matrix(1.01749 0 0 1 148.43 115.914)"/><circle cx="352.014" cy="268.302" r="33.095" style="fill:#a2a2a2" transform="matrix(1.01749 0 0 1 148.851 0)"/><circle cx="805.557" cy="336.915" r="118.199" style="fill:#8d8d8d" transform="matrix(.99196 0 0 1 3.36978 -10.2458)"/><circle cx="805.557" cy="336.915" r="118.199" style="fill:#8d8d8d" transform="matrix(.99196 0 0 1 255.633 -10.2458)"/><path d="M680.282 124.808h-68.093v390.325h68.081v-28.23H640V153.228h40.282v-28.42Z" style="fill:#303030"/><path d="M680.282 124.808h-68.093v390.325h68.081v-28.23H640V153.228h40.282v-28.42Z" style="fill:#303030" transform="matrix(-1 0 0 1 1857.19 0)"/></svg>
<svg xmlns="http://www.w3.org/2000/svg" xml:space="preserve" style="fill-rule:evenodd;clip-rule:evenodd;stroke-linejoin:round;stroke-miterlimit:2" viewBox="0 0 1280 640"><circle cx="141.023" cy="338.36" r="117.472" style="fill:#f8b5cb" transform="matrix(.997276 0 0 1.00556 10.0024 -14.823)"/><circle cx="352.014" cy="268.302" r="33.095" style="fill:#a2a2a2" transform="matrix(1.01749 0 0 1 -3.15847 0)"/><circle cx="352.014" cy="268.302" r="33.095" style="fill:#a2a2a2" transform="matrix(1.01749 0 0 1 -3.15847 115.914)"/><circle cx="352.014" cy="268.302" r="33.095" style="fill:#a2a2a2" transform="matrix(1.01749 0 0 1 148.43 115.914)"/><circle cx="352.014" cy="268.302" r="33.095" style="fill:#a2a2a2" transform="matrix(1.01749 0 0 1 148.851 0)"/><circle cx="805.557" cy="336.915" r="118.199" style="fill:#8d8d8d" transform="matrix(.99196 0 0 1 3.36978 -10.2458)"/><circle cx="805.557" cy="336.915" r="118.199" style="fill:#8d8d8d" transform="matrix(.99196 0 0 1 255.633 -10.2458)"/><path d="M680.282 124.808h-68.093v390.325h68.081v-28.23H640V153.228h40.282v-28.42Z" style="fill:#303030"/><path d="M680.282 124.808h-68.093v390.325h68.081v-28.23H640V153.228h40.282v-28.42Z" style="fill:#303030" transform="matrix(-1 0 0 1 1857.19 0)"/></svg>

Before

Width:  |  Height:  |  Size: 1.2 KiB

After

Width:  |  Height:  |  Size: 1.2 KiB

View File

Before

Width:  |  Height:  |  Size: 49 KiB

After

Width:  |  Height:  |  Size: 49 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 7.8 KiB

After

Width:  |  Height:  |  Size: 7.8 KiB

View File

@@ -9,9 +9,38 @@ When using ACL's the User borders are no longer applied. All machines
whichever the User have the ability to communicate with other hosts as
long as the ACL's permits this exchange.
## ACLs use case example
## ACL Setup
Let's build an example use case for a small business (It may be the place where
To enable and configure ACLs in Headscale, you need to specify the path to your ACL policy file in the `policy.path` key in `config.yaml`.
Your ACL policy file must be formatted using [huJSON](https://github.com/tailscale/hujson).
Info on how these policies are written can be found
[here](https://tailscale.com/kb/1018/acls/).
Please reload or restart Headscale after updating the ACL file. Headscale may be reloaded either via its systemd service
(`sudo systemctl reload headscale`) or by sending a SIGHUP signal (`sudo kill -HUP $(pidof headscale)`) to the main
process. Headscale logs the result of ACL policy processing after each reload.
## Simple Examples
- [**Allow All**](https://tailscale.com/kb/1192/acl-samples#allow-all-default-acl): If you define an ACL file but completely omit the `"acls"` field from its content, Headscale will default to an "allow all" policy. This means all devices connected to your tailnet will be able to communicate freely with each other.
```json
{}
```
- [**Deny All**](https://tailscale.com/kb/1192/acl-samples#deny-all): To prevent all communication within your tailnet, you can include an empty array for the `"acls"` field in your policy file.
```json
{
"acls": []
}
```
## Complex Example
Let's build a more complex example use case for a small business (It may be the place where
ACL's are the most useful).
We have a small company with a boss, an admin, two developers and an intern.
@@ -36,11 +65,7 @@ servers.
- billing.internal
- router.internal
![ACL implementation example](../images/headscale-acl-network.png)
## ACL setup
ACLs have to be written in [huJSON](https://github.com/tailscale/hujson).
![ACL implementation example](../assets/images/headscale-acl-network.png)
When [registering the servers](../usage/getting-started.md#register-a-node) we
will need to add the flag `--advertise-tags=tag:<tag1>,tag:<tag2>`, and the user
@@ -49,14 +74,6 @@ tags to a server they can register, the check of the tags is done on headscale
server and only valid tags are applied. A tag is valid if the user that is
registering it is allowed to do it.
To use ACLs in headscale, you must edit your `config.yaml` file. In there you will find a `policy.path` parameter. This
will need to point to your ACL file. More info on how these policies are written can be found
[here](https://tailscale.com/kb/1018/acls/).
Please reload or restart Headscale after updating the ACL file. Headscale may be reloaded either via its systemd service
(`sudo systemctl reload headscale`) or by sending a SIGHUP signal (`sudo kill -HUP $(pidof headscale)`) to the main
process. Headscale logs the result of ACL policy processing after each reload.
Here are the ACL's to implement the same permissions as above:
```json title="acl.json"
@@ -177,13 +194,95 @@ Here are the ACL's to implement the same permissions as above:
"dst": ["tag:dev-app-servers:80,443"]
},
// We still have to allow internal users communications since nothing guarantees that each user have
// their own users.
{ "action": "accept", "src": ["boss@"], "dst": ["boss@:*"] },
{ "action": "accept", "src": ["dev1@"], "dst": ["dev1@:*"] },
{ "action": "accept", "src": ["dev2@"], "dst": ["dev2@:*"] },
{ "action": "accept", "src": ["admin1@"], "dst": ["admin1@:*"] },
{ "action": "accept", "src": ["intern1@"], "dst": ["intern1@:*"] }
// Allow users to access their own devices using autogroup:self (see below for more details about performance impact)
{
"action": "accept",
"src": ["autogroup:member"],
"dst": ["autogroup:self:*"]
}
]
}
```
## Autogroups
Headscale supports several autogroups that automatically include users, destinations, or devices with specific properties. Autogroups provide a convenient way to write ACL rules without manually listing individual users or devices.
### `autogroup:internet`
Allows access to the internet through [exit nodes](routes.md#exit-node). Can only be used in ACL destinations.
```json
{
"action": "accept",
"src": ["group:users"],
"dst": ["autogroup:internet:*"]
}
```
### `autogroup:member`
Includes all [personal (untagged) devices](registration.md/#identity-model).
```json
{
"action": "accept",
"src": ["autogroup:member"],
"dst": ["tag:prod-app-servers:80,443"]
}
```
### `autogroup:tagged`
Includes all devices that [have at least one tag](registration.md/#identity-model).
```json
{
"action": "accept",
"src": ["autogroup:tagged"],
"dst": ["tag:monitoring:9090"]
}
```
### `autogroup:self`
!!! warning "The current implementation of `autogroup:self` is inefficient"
Includes devices where the same user is authenticated on both the source and destination. Does not include tagged devices. Can only be used in ACL destinations.
```json
{
"action": "accept",
"src": ["autogroup:member"],
"dst": ["autogroup:self:*"]
}
```
*Using `autogroup:self` may cause performance degradation on the Headscale coordinator server in large deployments, as filter rules must be compiled per-node rather than globally and the current implementation is not very efficient.*
If you experience performance issues, consider using more specific ACL rules or limiting the use of `autogroup:self`.
```json
{
// The following rules allow internal users to communicate with their
// own nodes in case autogroup:self is causing performance issues.
{ "action": "accept", "src": ["boss@"], "dst": ["boss@:*"] },
{ "action": "accept", "src": ["dev1@"], "dst": ["dev1@:*"] },
{ "action": "accept", "src": ["dev2@"], "dst": ["dev2@:*"] },
{ "action": "accept", "src": ["admin1@"], "dst": ["admin1@:*"] },
{ "action": "accept", "src": ["intern1@"], "dst": ["intern1@:*"] }
}
```
### `autogroup:nonroot`
Used in Tailscale SSH rules to allow access to any user except root. Can only be used in the `users` field of SSH rules.
```json
{
"action": "accept",
"src": ["autogroup:member"],
"dst": ["autogroup:self"],
"users": ["autogroup:nonroot"]
}
```

129
docs/ref/api.md Normal file
View File

@@ -0,0 +1,129 @@
# API
Headscale provides a [HTTP REST API](#rest-api) and a [gRPC interface](#grpc) which may be used to integrate a [web
interface](integration/web-ui.md), [remote control Headscale](#setup-remote-control) or provide a base for custom
integration and tooling.
Both interfaces require a valid API key before use. To create an API key, log into your Headscale server and generate
one with the default expiration of 90 days:
```shell
headscale apikeys create
```
Copy the output of the command and save it for later. Please note that you can not retrieve an API key again. If the API
key is lost, expire the old one, and create a new one.
To list the API keys currently associated with the server:
```shell
headscale apikeys list
```
and to expire an API key:
```shell
headscale apikeys expire --prefix <PREFIX>
```
## REST API
- API endpoint: `/api/v1`, e.g. `https://headscale.example.com/api/v1`
- Documentation: `/swagger`, e.g. `https://headscale.example.com/swagger`
- Headscale Version: `/version`, e.g. `https://headscale.example.com/version`
- Authenticate using HTTP Bearer authentication by sending the [API key](#api) with the HTTP `Authorization: Bearer <API_KEY>` header.
Start by [creating an API key](#api) and test it with the examples below. Read the API documentation provided by your
Headscale server at `/swagger` for details.
=== "Get details for all users"
```console
curl -H "Authorization: Bearer <API_KEY>" \
https://headscale.example.com/api/v1/user
```
=== "Get details for user 'bob'"
```console
curl -H "Authorization: Bearer <API_KEY>" \
https://headscale.example.com/api/v1/user?name=bob
```
=== "Register a node"
```console
curl -H "Authorization: Bearer <API_KEY>" \
--json '{"user": "<USER>", "authId": "AUTH_ID>"}' \
https://headscale.example.com/api/v1/auth/register
```
## gRPC
The gRPC interface can be used to control a Headscale instance from a remote machine with the `headscale` binary.
### Prerequisite
- A workstation to run `headscale` (any supported platform, e.g. Linux).
- A Headscale server with gRPC enabled.
- Connections to the gRPC port (default: `50443`) are allowed.
- Remote access requires an encrypted connection via TLS.
- An [API key](#api) to authenticate with the Headscale server.
### Setup remote control
1. Download the [`headscale` binary from GitHub's release page](https://github.com/juanfont/headscale/releases). Make
sure to use the same version as on the server.
1. Put the binary somewhere in your `PATH`, e.g. `/usr/local/bin/headscale`
1. Make `headscale` executable: `chmod +x /usr/local/bin/headscale`
1. [Create an API key](#api) on the Headscale server.
1. Provide the connection parameters for the remote Headscale server either via a minimal YAML configuration file or
via environment variables:
=== "Minimal YAML configuration file"
```yaml title="config.yaml"
cli:
address: <HEADSCALE_ADDRESS>:<PORT>
api_key: <API_KEY>
```
=== "Environment variables"
```shell
export HEADSCALE_CLI_ADDRESS="<HEADSCALE_ADDRESS>:<PORT>"
export HEADSCALE_CLI_API_KEY="<API_KEY>"
```
This instructs the `headscale` binary to connect to a remote instance at `<HEADSCALE_ADDRESS>:<PORT>`, instead of
connecting to the local instance.
1. Test the connection by listing all nodes:
```shell
headscale nodes list
```
You should now be able to see a list of your nodes from your workstation, and you can
now control the Headscale server from your workstation.
### Behind a proxy
It's possible to run the gRPC remote endpoint behind a reverse proxy, like Nginx, and have it run on the _same_ port as Headscale.
While this is _not a supported_ feature, an example on how this can be set up on
[NixOS is shown here](https://github.com/kradalby/dotfiles/blob/4489cdbb19cddfbfae82cd70448a38fde5a76711/machines/headscale.oracldn/headscale.nix#L61-L91).
### Troubleshooting
- Make sure you have the _same_ Headscale version on your server and workstation.
- Ensure that connections to the gRPC port are allowed.
- Verify that your TLS certificate is valid and trusted.
- If you don't have access to a trusted certificate (e.g. from Let's Encrypt), either:
- Add your self-signed certificate to the trust store of your OS _or_
- Disable certificate verification by either setting `cli.insecure: true` in the configuration file or by setting
`HEADSCALE_CLI_INSECURE=1` via an environment variable. We do **not** recommend to disable certificate validation.

View File

@@ -17,8 +17,8 @@
=== "View on GitHub"
* Development version: <https://github.com/juanfont/headscale/blob/main/config-example.yaml>
* Version {{ headscale.version }}: <https://github.com/juanfont/headscale/blob/v{{ headscale.version }}/config-example.yaml>
- Development version: <https://github.com/juanfont/headscale/blob/main/config-example.yaml>
- Version {{ headscale.version }}: https://github.com/juanfont/headscale/blob/v{{ headscale.version }}/config-example.yaml
=== "Download with `wget`"

118
docs/ref/debug.md Normal file
View File

@@ -0,0 +1,118 @@
# Debugging and troubleshooting
Headscale and Tailscale provide debug and introspection capabilities that can be helpful when things don't work as
expected. This page explains some debugging techniques to help pinpoint problems.
Please also have a look at [Tailscale's Troubleshooting guide](https://tailscale.com/kb/1023/troubleshooting). It offers
a many tips and suggestions to troubleshoot common issues.
## Tailscale
The Tailscale client itself offers many commands to introspect its state as well as the state of the network:
- [Check local network conditions](https://tailscale.com/kb/1080/cli#netcheck): `tailscale netcheck`
- [Get the client status](https://tailscale.com/kb/1080/cli#status): `tailscale status --json`
- [Get DNS status](https://tailscale.com/kb/1080/cli#dns): `tailscale dns status --all`
- Client logs: `tailscale debug daemon-logs`
- Client netmap: `tailscale debug netmap`
- Test DERP connection: `tailscale debug derp headscale`
- And many more, see: `tailscale debug --help`
Many of the commands are helpful when trying to understand differences between Headscale and Tailscale SaaS.
## Headscale
### Application logging
The log levels `debug` and `trace` can be useful to get more information from Headscale.
```yaml hl_lines="3"
log:
# Valid log levels: panic, fatal, error, warn, info, debug, trace
level: debug
```
### Database logging
The database debug mode logs all database queries. Enable it to see how Headscale interacts with its database. This also
requires the application log level to be set to either `debug` or `trace`.
```yaml hl_lines="3 7"
database:
# Enable debug mode. This setting requires the log.level to be set to "debug" or "trace".
debug: false
log:
# Valid log levels: panic, fatal, error, warn, info, debug, trace
level: debug
```
### Metrics and debug endpoint
Headscale provides a metrics and debug endpoint. It allows to introspect different aspects such as:
- Information about the Go runtime, memory usage and statistics
- Connected nodes and pending registrations
- Active ACLs, filters and SSH policy
- Current DERPMap
- Prometheus metrics
!!! warning "Keep the metrics and debug endpoint private"
The listen address and port can be configured with the `metrics_listen_addr` variable in the [configuration
file](./configuration.md). By default it listens on localhost, port 9090.
Keep the metrics and debug endpoint private to your internal network and don't expose it to the Internet.
The metrics and debug interface can be disabled completely by setting `metrics_listen_addr: null` in the
[configuration file](./configuration.md).
Query metrics via <http://localhost:9090/metrics> and get an overview of available debug information via
<http://localhost:9090/debug/>. Metrics may be queried from outside localhost but the debug interface is subject to
additional protection despite listening on all interfaces.
=== "Direct access"
Access the debug interface directly on the server where Headscale is installed.
```console
curl http://localhost:9090/debug/
```
=== "SSH port forwarding"
Use SSH port forwarding to forward Headscale's metrics and debug port to your device.
```console
ssh <HEADSCALE_SERVER> -L 9090:localhost:9090
```
Access the debug interface on your device by opening <http://localhost:9090/debug/> in your web browser.
=== "Via debug key"
The access control of the debug interface supports the use of a debug key. Traffic is accepted if the path to a
debug key is set via the environment variable `TS_DEBUG_KEY_PATH` and the debug key sent as value for `debugkey`
parameter with each request.
```console
openssl rand -hex 32 | tee debugkey.txt
export TS_DEBUG_KEY_PATH=debugkey.txt
headscale serve
```
Access the debug interface on your device by opening `http://<IP_OF_HEADSCALE>:9090/debug/?debugkey=<DEBUG_KEY>` in
your web browser. The `debugkey` parameter must be sent with every request.
=== "Via debug IP address"
The debug endpoint expects traffic from localhost. A different debug IP address may be configured by setting the
`TS_ALLOW_DEBUG_IP` environment variable before starting Headscale. The debug IP address is ignored when the HTTP
header `X-Forwarded-For` is present.
```console
export TS_ALLOW_DEBUG_IP=192.168.0.10 # IP address of your device
headscale serve
```
Access the debug interface on your device by opening `http://<IP_OF_HEADSCALE>:9090/debug/` in your web browser.

174
docs/ref/derp.md Normal file
View File

@@ -0,0 +1,174 @@
# DERP
A [DERP (Designated Encrypted Relay for Packets) server](https://tailscale.com/kb/1232/derp-servers) is mainly used to
relay traffic between two nodes in case a direct connection can't be established. Headscale provides an embedded DERP
server to ensure seamless connectivity between nodes.
## Configuration
DERP related settings are configured within the `derp` section of the [configuration file](./configuration.md). The
following sections only use a few of the available settings, check the [example configuration](./configuration.md) for
all available configuration options.
### Enable embedded DERP
Headscale ships with an embedded DERP server which allows to run your own self-hosted DERP server easily. The embedded
DERP server is disabled by default and needs to be enabled. In addition, you should configure the public IPv4 and public
IPv6 address of your Headscale server for improved connection stability:
```yaml title="config.yaml" hl_lines="3-5"
derp:
server:
enabled: true
ipv4: 198.51.100.1
ipv6: 2001:db8::1
```
Keep in mind that [additional ports are needed to run a DERP server](../setup/requirements.md#ports-in-use). Besides
relaying traffic, it also uses STUN (udp/3478) to help clients discover their public IP addresses and perform NAT
traversal. [Check DERP server connectivity](#check-derp-server-connectivity) to see if everything works.
### Remove Tailscale's DERP servers
Once enabled, Headscale's embedded DERP is added to the list of free-to-use [DERP
servers](https://tailscale.com/kb/1232/derp-servers) offered by Tailscale Inc. To only use Headscale's embedded DERP
server, disable the loading of the default DERP map:
```yaml title="config.yaml" hl_lines="6"
derp:
server:
enabled: true
ipv4: 198.51.100.1
ipv6: 2001:db8::1
urls: []
```
!!! warning "Single point of failure"
Removing Tailscale's DERP servers means that there is now just a single DERP server available for clients. This is a
single point of failure and could hamper connectivity.
[Check DERP server connectivity](#check-derp-server-connectivity) with your embedded DERP server before removing
Tailscale's DERP servers.
### Customize DERP map
The DERP map offered to clients can be customized with a [dedicated YAML-configuration
file](https://github.com/juanfont/headscale/blob/main/derp-example.yaml). This allows to modify previously loaded DERP
maps fetched via URL or to offer your own, custom DERP servers to nodes.
=== "Remove specific DERP regions"
The free-to-use [DERP servers](https://tailscale.com/kb/1232/derp-servers) are organized into regions via a region
ID. You can explicitly disable a specific region by setting its region ID to `null`. The following sample
`derp.yaml` disables the New York DERP region (which has the region ID 1):
```yaml title="derp.yaml"
regions:
1: null
```
Use the following configuration to serve the default DERP map (excluding New York) to nodes:
```yaml title="config.yaml" hl_lines="6 7"
derp:
server:
enabled: false
urls:
- https://controlplane.tailscale.com/derpmap/default
paths:
- /etc/headscale/derp.yaml
```
=== "Provide custom DERP servers"
The following sample `derp.yaml` references two custom regions (`custom-east` with ID 900 and `custom-west` with ID 901)
with one custom DERP server in each region. Each DERP server offers DERP relay via HTTPS on tcp/443, support for captive
portal checks via HTTP on tcp/80 and STUN on udp/3478. See the definitions of
[DERPMap](https://pkg.go.dev/tailscale.com/tailcfg#DERPMap),
[DERPRegion](https://pkg.go.dev/tailscale.com/tailcfg#DERPRegion) and
[DERPNode](https://pkg.go.dev/tailscale.com/tailcfg#DERPNode) for all available options.
```yaml title="derp.yaml"
regions:
900:
regionid: 900
regioncode: custom-east
regionname: My region (east)
nodes:
- name: 900a
regionid: 900
hostname: derp900a.example.com
ipv4: 198.51.100.1
ipv6: 2001:db8::1
canport80: true
901:
regionid: 901
regioncode: custom-west
regionname: My Region (west)
nodes:
- name: 901a
regionid: 901
hostname: derp901a.example.com
ipv4: 198.51.100.2
ipv6: 2001:db8::2
canport80: true
```
Use the following configuration to only serve the two DERP servers from the above `derp.yaml`:
```yaml title="config.yaml" hl_lines="5 6"
derp:
server:
enabled: false
urls: []
paths:
- /etc/headscale/derp.yaml
```
Independent of the custom DERP map, you may choose to [enable the embedded DERP server and have it automatically added
to the custom DERP map](#enable-embedded-derp).
### Verify clients
Access to DERP serves can be restricted to nodes that are members of your Tailnet. Relay access is denied for unknown
clients.
=== "Embedded DERP"
Client verification is enabled by default.
```yaml title="config.yaml" hl_lines="3"
derp:
server:
verify_clients: true
```
=== "3rd-party DERP"
Tailscale's `derper` provides two parameters to configure client verification:
- Use the `-verify-client-url` parameter of the `derper` and point it towards the `/verify` endpoint of your
Headscale server (e.g `https://headscale.example.com/verify`). The DERP server will query your Headscale instance
as soon as a client connects with it to ask whether access should be allowed or denied. Access is allowed if
Headscale knows about the connecting client and denied otherwise.
- The parameter `-verify-client-url-fail-open` controls what should happen when the DERP server can't reach the
Headscale instance. By default, it will allow access if Headscale is unreachable.
## Check DERP server connectivity
Any Tailscale client may be used to introspect the DERP map and to check for connectivity issues with DERP servers.
- Display DERP map: `tailscale debug derp-map`
- Check connectivity with the embedded DERP[^1]:`tailscale debug derp headscale`
Additional DERP related metrics and information is available via the [metrics and debug
endpoint](./debug.md#metrics-and-debug-endpoint).
## Limitations
- The embedded DERP server can't be used for Tailscale's captive portal checks as it doesn't support the `/generate_204`
endpoint via HTTP on port tcp/80.
- There are no speed or throughput optimisations, the main purpose is to assist in node connectivity.
[^1]: This assumes that the default region code of the [configuration file](./configuration.md) is used.

View File

@@ -1,7 +1,7 @@
# DNS
Headscale supports [most DNS features](../about/features.md) from Tailscale. DNS related settings can be configured
within `dns` section of the [configuration file](./configuration.md).
within the `dns` section of the [configuration file](./configuration.md).
## Setting extra DNS records
@@ -23,9 +23,9 @@ hostname and port combination "http://hostname-in-magic-dns.myvpn.example.com:30
!!! warning "Limitations"
Currently, [only A and AAAA records are processed by Tailscale](https://github.com/tailscale/tailscale/blob/v1.78.3/ipn/ipnlocal/local.go#L4461-L4479).
Currently, [only A and AAAA records are processed by Tailscale](https://github.com/tailscale/tailscale/blob/v1.86.5/ipn/ipnlocal/node_backend.go#L662).
1. Configure extra DNS records using one of the available configuration options:
1. Configure extra DNS records using one of the available configuration options:
=== "Static entries, via `dns.extra_records`"
@@ -66,12 +66,12 @@ hostname and port combination "http://hostname-in-magic-dns.myvpn.example.com:30
!!! tip "Good to know"
* The `dns.extra_records_path` option in the [configuration file](./configuration.md) needs to reference the
- The `dns.extra_records_path` option in the [configuration file](./configuration.md) needs to reference the
JSON file containing extra DNS records.
* Be sure to "sort keys" and produce a stable output in case you generate the JSON file with a script.
- Be sure to "sort keys" and produce a stable output in case you generate the JSON file with a script.
Headscale uses a checksum to detect changes to the file and a stable output avoids unnecessary processing.
1. Verify that DNS records are properly set using the DNS querying tool of your choice:
1. Verify that DNS records are properly set using the DNS querying tool of your choice:
=== "Query with dig"
@@ -87,7 +87,7 @@ hostname and port combination "http://hostname-in-magic-dns.myvpn.example.com:30
100.64.0.3
```
1. Optional: Setup the reverse proxy
1. Optional: Setup the reverse proxy
The motivating example here was to be able to access internal monitoring services on the same host without
specifying a port, depicted as NGINX configuration snippet:

View File

@@ -13,7 +13,7 @@ Running headscale behind a reverse proxy is useful when running multiple applica
The reverse proxy MUST be configured to support WebSockets to communicate with Tailscale clients.
WebSockets support is also required when using the headscale embedded DERP server. In this case, you will also need to expose the UDP port used for STUN (by default, udp/3478). Please check our [config-example.yaml](https://github.com/juanfont/headscale/blob/main/config-example.yaml).
WebSockets support is also required when using the Headscale [embedded DERP server](../derp.md). In this case, you will also need to expose the UDP port used for STUN (by default, udp/3478). Please check our [config-example.yaml](https://github.com/juanfont/headscale/blob/main/config-example.yaml).
### Cloudflare

Some files were not shown because too many files have changed in this diff Show More