Commit Graph

629 Commits

Author SHA1 Message Date
Kristoffer Dalby
d6dfdc100c hscontrol: route hostname handling through dnsname and NodeStore
Ingest (registration and MapRequest updates) now calls
dnsname.SanitizeHostname directly and lets NodeStore auto-bump on
collision. Admin rename uses dnsname.ValidLabel + SetGivenName so
conflicts are surfaced to the caller instead of silently mutated.

Three duplicate invalidDNSRegex definitions, the old NormaliseHostname
and ValidateHostname helpers, EnsureHostname, InvalidString,
ApplyHostnameFromHostInfo, GivenNameHasBeenChanged, generateGivenName
and EnsureUniqueGivenName are removed along with their tests.
ValidateHostname's username half is retained as ValidateUsername for
users.go.

The SaaS-matching collision rule replaces the random "invalid-xxxxxx"
fallback and the 8-character hash suffix; the empty-input fallback is
the literal "node". TestUpdateHostnameFromClient now exercises the
rewrite end-to-end with awkward macOS/Windows names.

Fixes #3188
Fixes #2926
Fixes #2343
Fixes #2762
Fixes #2449
Updates #2177
Updates #2121
Updates #363
2026-04-18 15:12:21 +01:00
Kristoffer Dalby
a2c3ac095e state: auto-bump GivenName on collision and add SetGivenName
NodeStore's writer goroutine now resolves GivenName collisions inside
applyBatch: on PutNode/UpdateNode the landing label gets -N appended
until unique, matching Tailscale SaaS. Empty labels fall back to the
literal "node".

SetGivenName exposes the admin-rename path: validates via
dnsname.ValidLabel and rejects on collision with ErrGivenNameTaken,
so renames do not silently rewrite behind the caller.

Updates #3188
Updates #2926
Updates #2343
Updates #2762
2026-04-18 15:12:21 +01:00
Florian Preinstorfer
f1494a32ce Update links to Tailscale documentation 2026-04-18 09:33:41 +02:00
Kristoffer Dalby
436d3db28e servertest: add dynamic HA failover tests
Existing HA tests verify server-side primary election; these add
end-to-end assertions from a viewer client's perspective, that marking
the primary unhealthy or revoking its approved route propagates through
the netmap so the viewer sees the flip.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
978f1e3947 state: tie-break ResolveNode by GivenName then lowest NodeID
Resolve by GivenName (unique per tailnet) before Hostname (client-
reported, may collide); within each pass, pick the lowest NodeID so
results are deterministic across NodeStore snapshot iterations.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
2530d86f1b change: document PingRequest merge first-wins foot-gun
change.Merge keeps the first PingRequest seen when merging, which
means a later probe's callback URL is silently dropped if a stale merge
is in-flight. Document the contract at Merge and at doPing so callers
know to serialise probes.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
427b2f15ee matcher: clarify DestsIsTheInternet single-family semantics
DestsIsTheInternet now reports the internet when either family's /0
is contained (0.0.0.0/0 or ::/0), matching what operators expect when
they write the /0 directly. Also documents MatchFromStrings fail-open.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
93e8c7285f debug: explain URLIsNoise choice in ping callback
The /debug/ping callback hits /machine/ping-response on the main
TLS router, not the noise chain, so URLIsNoise stays false. Document
this at the emit site to prevent accidental changes.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
842f36225e state: drain pending pings on Close
Blocked callers waiting on a pingTracker response channel would
hang forever if the server Close()d mid-probe. Drain the pending map on
Close so those goroutines unblock and exit cleanly.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
0567cb6da3 app: add security headers middleware
X-Frame-Options: DENY and frame-ancestors 'none' stop clickjacking
of OIDC, register-confirm, and debug HTML pages. nosniff and no-referrer
are cheap defence-in-depth for the same surfaces.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
5a7cafdf85 noise: reject non-HEAD on PingResponseHandler
chi routes only HEAD to the handler, but assert explicitly so a
future router config change cannot silently accept GET/POST and leak
latency bytes or side-effects.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
f3eb9a7bba templates: escape query value in ping page
elem-go does not escape attribute values, so the raw query reaches
the rendered HTML verbatim. Pre-escape with html.EscapeString to prevent
reflected XSS.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
164d659dd2 servertest: add TestViaGrantHACompat for via+HA compat tests
Data-driven tests for via grants combined with HA primary routes:
crossed via tags on same prefix, mixed via+regular across HA pairs,
four-way HA, and the kitchen-sink scenario. Each case uses an inline
topology captured from SaaS.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
7d104b8c8d servertest: add via grant map compat tests
End-to-end exercise of via-grant compilation against SaaS captures:
peer visibility, AllowedIPs, PrimaryRoutes, and per-rule src/dst
reachability from each viewer's perspective.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
a7c9721faa policy/v2: overhaul compat test infrastructure
Reworks the ACL/routes/grant/SSH compat harnesses to read
testcapture.Capture typed files, per-scenario topologies, strict error
wording match, and shared helpers. Surfaces policy-engine drift against
Tailscale SaaS.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
f34dec2754 testcapture: add typed capture format package
Typed Capture/Input/Node/Topology structs for golden SaaS captures.
Schema drift between the tscap capture tool and headscale now becomes a
compile error instead of a silent test pass.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
1059c678c4 hscontrol/types: silence zerolog by default in tests
Tests were dumping megabytes of zerolog output on failure; silence
at init and let individual tests opt in via SetGlobalLevel when they need
log-driven assertions.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
affaa1a31d policy/v2: align SSH check action with SaaS wire format
SSH check rules now emit CheckPeriod in seconds (matching
Tailscale SaaS) instead of nanoseconds. Adds golden compat tests covering
accept/check modes.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
ded51a4d30 policyutil: fix reduceCapGrantRule and add route reduction
reduceCapGrantRule was dropping rules whose CapGrant IPs overlap a
subnet route; treat subnet routes as part of node identity so those rules
survive reduction. ReduceFilterRules now also reduces route-reachable
destinations.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
b051e7b2bc policy/v2: wire PolicyManager through compiledGrant
Threads PolicyManager into compiledGrant so via grants resolve
viewer identity at compile time instead of re-resolving per MapRequest.
Adds a matchersForNodeMap cache invalidated on policy reload and on node
add/remove.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
b01e67e8e5 types: consider subnet routes as source identity in ACL matching
CanAccess now treats a node's advertised subnet routes as part of
its source identity, so an ACL granting the subnet-owner as source lets
traffic from the subnet through. Matches SaaS semantics.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
f49c42e716 testdata: add SaaS captures for compat tests
Golden captures of SaaS filter-rules and netmaps across the ACL,
grant, routes, and SSH corpora. These back the data-driven compat tests
that verify headscale's policy output against Tailscale SaaS verbatim.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
0378e2d2c6 servertest: add HA health probing tests
Five scenarios: healthy probes, failover on unhealthy primary,
no-flap recovery, connect clears unhealthy, no-op without HA routes.

Updates #2129
Updates #2902
2026-04-16 15:10:56 +01:00
Kristoffer Dalby
8a97dd134b app: wire HA health prober into scheduled tasks
Run the prober on a ticker in scheduledTasks. Enabled by default
(10s interval, 5s timeout). No-op when no HA routes exist.

Fixes #2129
Fixes #2902
2026-04-16 15:10:56 +01:00
Kristoffer Dalby
90e65ccd63 state: add HA health prober
Ping HA subnet routers each probe cycle and mark unresponsive nodes
unhealthy. Reconnecting a node clears its unhealthy state since the
fresh Noise session proves basic connectivity.

Updates #2129
Updates #2902
2026-04-16 15:10:56 +01:00
Kristoffer Dalby
786ce2dce8 routes: add health dimension to HA primary route election
Track unhealthy nodes in PrimaryRoutes so primary election skips them.
When all nodes for a prefix are unhealthy, keep the first as a degraded
primary rather than dropping the route entirely.

Anti-flap is built in: a recovered node becomes standby, not primary,
because updatePrimaryLocked keeps the current primary when still
available and healthy.

Updates #2129
Updates #2902
2026-04-16 15:10:56 +01:00
Kristoffer Dalby
c9dbea5c18 templates: improve ping page spacing and design system usage
- Remove redundant inline button/input styles that duplicate CSS
- Use CSS variables for input (dark mode support)
- Use A(), Ul(), Ol(), P() wrappers from general.go
- Add expandable explanation of what the ping tests
- Fix section spacing rhythm (spaceXL before results, space2XL
  before connected nodes)
- Add flex-wrap for mobile responsiveness
2026-04-15 10:53:35 +01:00
Kristoffer Dalby
0e5569c3fc templates: add detailsBox collapsible component
Add a reusable <details>/<summary> component to the shared design
system. Styled to match the existing card/box component family
(border, radius, CSS variables for dark mode).

Collapsed by default with a clickable summary line.
2026-04-15 10:53:35 +01:00
Kristoffer Dalby
97778c9930 all: add tests for PingRequest implementation
Unit tests for Change (IsEmpty, Merge, Type, PingNode constructor),
ping tracker (register/complete/cancel lifecycle, concurrency, latency),
and end-to-end servertests exercising the full round-trip with real
controlclient.Direct instances.

Updates #2902
Updates #2129
2026-04-15 10:53:35 +01:00
Kristoffer Dalby
b113655b71 all: implement PingRequest for node connectivity checking
Implement tailcfg.PingRequest support so the control server can verify
whether a connected node is still reachable. This is the foundation for
faster offline detection (currently ~16min due to Go HTTP/2 TCP retransmit
behavior) and future C2N communication.

The server sends a PingRequest via MapResponse with a unique callback
URL. The Tailscale client responds with a HEAD request to that URL,
proving connectivity. Round-trip latency is measured.

Wire PingRequest through the Change → Batcher → MapResponse pipeline,
add a ping tracker on State for correlating requests with responses,
add ResolveNode for looking up nodes by ID/IP/hostname, and expose a
/debug/ping page (elem-go form UI) and /machine/ping-response endpoint.

Updates #2902
Updates #2129
2026-04-15 10:53:35 +01:00
Kristoffer Dalby
de5b1eab68 templates: use table layout for registration confirm details
Replace the bullet list of device details with a two-column table
for cleaner visual hierarchy. Labels are bold and left-aligned,
values right-aligned with subtle row separators. The machine key
value uses an inline code style.

Updates juanfont/headscale#3182
2026-04-13 17:23:47 +01:00
Kristoffer Dalby
f066d12153 assets: fix logo alignment and error icon centering
Tighten the SVG viewBox to the actual content bounding box and
remove hardcoded width/height attributes so the browser no longer
adds horizontal padding via preserveAspectRatio. The "h" wordmark
now left-aligns with the page content below it.

Replace the error icon SVG path (which had an off-center X) with
a simple circle + two crossed lines drawn from a centered viewBox.
Both icons now use fill="currentColor" for dark mode adaptation.

Updates juanfont/headscale#3182
2026-04-13 17:23:47 +01:00
Kristoffer Dalby
3918020551 templates: use CSS variables in all shared components
Replace hardcoded Go color constants with var(--hs-*) and
var(--md-*) CSS custom properties in externalLink, orDivider,
card, warningBox, downloadButton, and pageFooter. This ensures
all components follow the dark mode theme automatically.

Also switch pageFooter from div to semantic footer element and
simplify externalLink by letting CSS handle link styling.

Updates juanfont/headscale#3182
2026-04-13 17:23:47 +01:00
Kristoffer Dalby
93860a5c06 all: apply formatter changes 2026-04-13 17:23:47 +01:00
Kristoffer Dalby
814226f327 templates: improve accessibility, dark mode, and typography
Bump base font size from 0.8rem to 1rem (16px) to meet mobile
accessibility guidelines and avoid iOS auto-zoom on inputs.

Add CSS custom properties for all theme colors with a
prefers-color-scheme: dark media query so pages adapt to OS dark
mode. Component inline styles reference var(--hs-*) tokens so they
follow the scheme automatically.

Accessibility improvements:
- role="status" + aria-live="polite" on success boxes
- role="alert" + aria-live="assertive" on error boxes
- role="note" on warning boxes
- Visible focus rings via :focus-visible
- Link underlines (don't rely on color alone)
- SVG icons use currentColor for theme adaptation
- prefers-reduced-motion media query
- <main> landmark element wrapping page content
- Button styling with 44px min-height touch target
- List item spacing

Updates juanfont/headscale#3182
2026-04-13 17:23:47 +01:00
Kristoffer Dalby
78990491da oidc: render HTML error pages for browser-facing failures
Add httpUserError() alongside httpError() for browser-facing error
paths. It renders a styled HTML page using the AuthError template
instead of returning plain text. Technical error details stay in
server logs; the HTML page shows actionable messages derived from
the HTTP status code:

  401/403 → "You are not authorized. Please contact your administrator."
  410     → "Your session has expired. Please try again."
  400-499 → "The request could not be processed. Please try again."
  500+    → "Something went wrong. Please try again later."

Convert all httpError calls in oidc.go (OIDC callback, SSH check,
registration confirm) to httpUserError. Machine-facing endpoints
(noise, verify, key, health, debug) are unchanged.

Fixes juanfont/headscale#3182
2026-04-13 17:23:47 +01:00
Kristoffer Dalby
c15caff48c templates: add error box component and error page template
Add errorBox() and errorIcon() to the design system, mirroring the
existing successBox()/checkboxIcon() pattern with red error styling.
Extract error color constants from the inline values in statusMessage().

Add AuthError() template that renders a styled HTML error page using
the same HtmlStructure/mdTypesetBody/logo/footer as all other
browser-facing pages.

Updates juanfont/headscale#3182
2026-04-13 17:23:47 +01:00
Kristoffer Dalby
d66d3a4269 oidc: add confirmation page for node registration
Render an interstitial showing device hostname, OS, and machine-key
fingerprint before finalising OIDC registration. The user must POST
to /register/confirm/{auth_id} with a CSRF double-submit cookie.
Removes the TODO at oidc.go:201.
2026-04-10 14:09:57 +01:00
Kristoffer Dalby
d5a4e6e36a debug: route statsviz through tsweb.Protected
Build the statsviz Server directly and wrap its Index/Ws handlers in
tsweb.Protected instead of calling statsviz.Register on the raw mux
which bypasses AllowDebugAccess.
2026-04-10 14:09:57 +01:00
Kristoffer Dalby
8c6cb05ab4 noise: pass context to sshActionFollowUp
Select on ctx.Done() alongside auth.WaitForAuth() so the goroutine
exits promptly when the client disconnects instead of parking until
cache eviction.
2026-04-10 14:09:57 +01:00
Kristoffer Dalby
42b8c779a0 hscontrol: limit /verify request body size
Wrap req.Body with io.LimitReader bounded to 4 KiB before
io.ReadAll. The DERP verify payload is a few hundred bytes.
2026-04-10 14:09:57 +01:00
Kristoffer Dalby
a3c4ad2ca3 types: omit secret fields from JSON marshalling
Add json:"-" to PostgresConfig.Pass, OIDCConfig.ClientSecret, and
CLIConfig.APIKey so they are excluded from json.Marshal output
(e.g. the /debug/config endpoint).
2026-04-10 14:09:57 +01:00
Kristoffer Dalby
0641771128 db: guard UsePreAuthKey with WHERE used=false
Add a row-level check so concurrent registrations with the same
single-use key cannot both succeed. Skip the call on
re-registration where the key is already marked used (#2830).
2026-04-10 14:09:57 +01:00
Kristoffer Dalby
f7d8bb8b3f app: remove gRPC reflection from remote server
Reflection is a streaming RPC and bypasses the unary auth
interceptor on the remote (TCP) gRPC server. Remove it there;
the unix-socket server retains it for local debugging.
2026-04-10 14:09:57 +01:00
Kristoffer Dalby
adb9467f60 oidc: validate state parameter length in callback
getCookieName sliced value[:6] unconditionally; a short state query
parameter caused a panic recovered by chi middleware. Reject states
shorter than cookieNamePrefixLen with 400.
2026-04-10 14:09:57 +01:00
Kristoffer Dalby
41d70fe87b auth: check machine key on tailscaled-restart fast path
The #2862 restart path returned nodeToRegisterResponse after a
NodeKey-only lookup without verifying MachineKey. Add the same
check handleLogout already performs.
2026-04-10 14:09:57 +01:00
Kristoffer Dalby
99767cf805 hscontrol: validate machine key and bind src/dst in SSH check handler
SSHActionHandler now verifies that the Noise session's machine key
matches the dst node before proceeding. The (src, dst) pair is
captured at hold-and-delegate time via a new SSHCheckBinding on
AuthRequest so sshActionFollowUp can verify the follow-up URL
matches. The OIDC non-registration callback requires the
authenticated user to own the src node before approving.
2026-04-10 14:09:57 +01:00
Kristoffer Dalby
0d4f2293ff state: replace zcache with bounded LRU for auth cache
Replace zcache with golang-lru/v2/expirable for both the state auth
cache and the OIDC state cache. Add tuning.register_cache_max_entries
(default 1024) to cap the number of pending registration entries.

Introduce types.RegistrationData to replace caching a full *Node;
only the fields the registration callback path reads are retained.
Remove the dead HSDatabase.regCache field. Drop zgo.at/zcache/v2
from go.mod.
2026-04-10 14:09:57 +01:00
Kristoffer Dalby
3587225a88 mapper: fix phantom updateSentPeers on disconnected nodes
When send() is called on a node with zero active connections
(disconnected but kept for rapid reconnection), it returns nil
(success). handleNodeChange then calls updateSentPeers, recording
peers as delivered when no client received the data.

This corrupts lastSentPeers: future computePeerDiff calculations
produce wrong results because they compare against phantom state.
After reconnection, the node's initial map resets lastSentPeers,
but any changes processed during the disconnect window leave
stale entries that cause asymmetric peer visibility.

Return errNoActiveConnections from send() when there are no
connections. handleNodeChange treats this as a no-op (the change
was generated but not deliverable) and skips updateSentPeers,
keeping lastSentPeers consistent with what clients actually
received.
2026-04-10 13:18:56 +01:00
Kristoffer Dalby
9371b4ee28 mapper: fix empty Peers list not clearing client peer state
When a FullUpdate produces zero visible peers (e.g., a restrictive
policy isolates a node), the MapResponse has Peers: [] (empty
non-nil). The Tailscale client only processes Peers as a full
replacement when len(Peers) > 0 (controlclient/map.go:462), so an
empty list is silently ignored and stale peers persist.

This triggers when a FullUpdate() replaces a pending PolicyChange()
in the batcher. The PolicyChange would have used computePeerDiff to
send explicit PeersRemoved, but the FullUpdate goes through
buildFromChange which sets Peers: [] that the client ignores.

When a full update produces zero peers, compute the peer diff
against lastSentPeers and add explicit PeersRemoved entries so the
client correctly clears its stale peer state.
2026-04-10 13:18:56 +01:00