Files
headscale/AGENTS.md
Kristoffer Dalby 157e3a30fc AGENTS.md: trim to behavioural guidance, drop deprecated sub-agent
Procedural content moves to cmd/hi/README.md and integration/README.md.
Stale references (poll.go:420, mapper/tail.go, notifier/,
quality-control-enforcer, validateAndNormalizeTags) are corrected or
removed.
2026-04-10 12:30:07 +01:00

12 KiB

AGENTS.md

Behavioural guidance for AI agents working in this repository. Reference material for complex procedures lives next to the code — integration testing is documented in cmd/hi/README.md and integration/README.md. Read those files before running tests or writing new ones.

Headscale is an open-source implementation of the Tailscale control server written in Go. It manages node registration, IP allocation, policy enforcement, and DERP routing for self-hosted tailnets.

Interaction Rules

These rules govern how you work in this repo. They are listed first because they shape every other decision.

Ask with comprehensive multiple-choice options

When you need to clarify intent, scope, or approach, use the AskUserQuestion tool (or a numbered list fallback) and present the user with a comprehensive set of options. Cover the likely branches explicitly and include an "other — please describe" escape.

  • Bad: "How should I handle expired nodes?"
  • Good: "How should expired nodes be handled? (a) Remain visible to peers but marked expired (current behaviour); (b) Hidden from peers entirely; (c) Hidden from peers but visible in admin API; (d) Other."

This matters more than you think — open-ended questions waste a round trip and often produce a misaligned answer.

Read the documented procedure before running complex commands

Before invoking any hi command, integration test, generator, or migration tool, read the referenced README in full — cmd/hi/README.md for running tests, integration/README.md for writing them. Never guess flags. If the procedure is not documented anywhere, ask the user rather than inventing one.

Map once, then act

Use Glob / Grep to understand file structure, then execute. Do not re-explore the same area to "double-check" once you have a plan. Do not re-read files you edited in this session — the harness tracks state for you.

Fail fast, report up

If a command fails twice with the same error, stop and report the exact error to the user with context. Do not loop through variants or "try one more thing". A repeated failure means your model of the problem is wrong.

Confirm scope for multi-file changes

Before touching more than three files, show the user which files will change and why. Use plan mode (ExitPlanMode) for non-trivial work.

Prefer editing existing files

Do not create new files unless strictly necessary. Do not generate helper abstractions, wrapper utilities, or "just in case" configuration. Three similar lines of code is better than a premature abstraction.

Quick Start

# Enter the nix dev shell (Go 1.26.1, buf, golangci-lint, prek)
nix develop

# Full development workflow: fmt + lint + test + build
make dev

# Individual targets
make build           # build the headscale binary
make test            # go test ./...
make fmt             # format Go, docs, proto
make lint            # lint Go, proto
make generate        # regenerate protobuf code (after changes to proto/)
make clean           # remove build artefacts

# Direct go test invocations
go test ./...
go test -race ./...

# Integration tests — read cmd/hi/README.md first
go run ./cmd/hi doctor
go run ./cmd/hi run "TestName"

Go 1.26.1 minimum (per go.mod:3). nix develop pins the exact toolchain used in CI.

Pre-Commit with prek

prek installs git hooks that run the same checks as CI.

nix develop
prek install            # one-time setup
prek run                # run hooks on staged files
prek run --all-files    # run hooks on the full tree

Hooks cover: file hygiene (trailing whitespace, line endings, BOM), syntax validation (JSON/YAML/TOML/XML), merge-conflict markers, private key detection, nixpkgs-fmt, prettier, and golangci-lint via --new-from-rev=HEAD~1 (see .pre-commit-config.yaml:59). A manual invocation with an upstream/main remote is equivalent:

golangci-lint run --new-from-rev=upstream/main --timeout=5m --fix

git commit --no-verify is acceptable only for WIP commits on feature branches — never on main.

Project Layout

headscale/
├── cmd/
│   ├── headscale/    # Main headscale server binary
│   └── hi/           # Integration test runner (see cmd/hi/README.md)
├── hscontrol/        # Core control plane
├── integration/      # End-to-end Docker-based tests (see integration/README.md)
├── proto/            # Protocol buffer definitions
├── gen/              # Generated code (buf output — do not edit)
├── docs/             # User and ACL reference documentation
└── packaging/        # Distribution packaging

hscontrol/ packages

  • app.go, handlers.go, grpcv1.go, noise.go, auth.go, oidc.go, poll.go, metrics.go, debug.go, tailsql.go, platform_config.go — top-level server files
  • state/ — central coordinator (state.go) and the copy-on-write NodeStore (node_store.go). All cross-subsystem operations go through State.
  • db/ — GORM layer, migrations, schema. node.go, users.go, api_key.go, preauth_keys.go, ip.go, policy.go.
  • mapper/ — streaming batcher that distributes MapResponses to clients: batcher.go, node_conn.go, builder.go, mapper.go. Performance-critical.
  • policy/policy/v2/ is the policy implementation. The top-level policy.go is thin wrappers. There is no v1 directory.
  • routes/, dns/, derp/, types/, util/, templates/, capver/ — routing, MagicDNS, relay, core types, helpers, client templates, capability versioning.
  • servertest/ — in-memory test harness for server-level tests that don't need Docker. Prefer this over integration/ when possible.
  • assets/ — embedded UI assets.

cmd/hi/ files

main.go, run.go, doctor.go, docker.go, cleanup.go, stats.go, README.md. Read cmd/hi/README.md before running any hi command.

Architecture Essentials

  • hscontrol/state/state.go is the central coordinator. Cross-cutting operations (node updates, policy evaluation, IP allocation) go through the State type, not directly to the database.
  • NodeStore in hscontrol/state/node_store.go is a copy-on-write in-memory cache backed by atomic.Pointer[Snapshot]. Every read is a pointer load; writes rebuild a new snapshot and atomically swap. It is the hot path for MapRequest processing and peer visibility.
  • The map-request sync point is State.UpdateNodeFromMapRequest() in hscontrol/state/state.go:2351. This is where Hostinfo changes, endpoint updates, and route advertisements land in the NodeStore.
  • Mapper subsystem streams MapResponses via batcher.go and node_conn.go. Changes here affect all connected clients.
  • Node registration flow: noise handshake (noise.go) → auth (auth.go) → state/DB persistence (state/, db/) → initial map (mapper/).

Database Migration Rules

These rules are load-bearing — violating them corrupts production databases. The migrationsRequiringFKDisabled map in hscontrol/db/db.go:962 is frozen as of 2025-07-02 (see the comment at db.go:989). All new migrations must:

  1. Never reorder existing migrations. Migration order is immutable once committed.
  2. Only add new migrations to the end of the migrations array.
  3. Never disable foreign keys. No new entries in migrationsRequiringFKDisabled.
  4. Use the migration ID format YYYYMMDDHHMM-short-description (timestamp + descriptive suffix). Example: 202602201200-clear-tagged-node-user-id.
  5. Never rename columns that later migrations reference. Let AutoMigrate create a new column if needed.

Tags-as-Identity

Headscale enforces tags XOR user ownership: every node is either tagged (owned by tags) or user-owned (owned by a user namespace), never both. This is a load-bearing architectural invariant.

  • Use node.IsTagged() (hscontrol/types/node.go:221) to determine ownership, not node.UserID().Valid(). A tagged node may still have UserID set for "created by" tracking — IsTagged() is authoritative.
  • IsUserOwned() (node.go:227) returns !IsTagged().
  • Tagged nodes are presented to Tailscale as the special TaggedDevices user (hscontrol/types/users.go, ID 2147455555).
  • SetTags validation is enforced by validateNodeOwnership() in hscontrol/state/tags.go.
  • Examples and edge cases live in hscontrol/types/node_tags_test.go and hscontrol/grpcv1_test.go (TestSetTags_*).

Don't do this:

if node.UserID().Valid() { /* assume user-owned */ }       // WRONG
if node.UserID().Valid() && !node.IsTagged() { /* ok */ }  // correct

Policy Engine

hscontrol/policy/v2/policy.go is the policy implementation. The top-level hscontrol/policy/policy.go contains only wrapper functions around v2. There is no v1 directory.

Key concepts an agent will encounter:

  • Autogroups: autogroup:self, autogroup:member, autogroup:internet
  • Tag owners: IP-based authorization for who can claim a tag
  • Route approvals: auto-approval of subnet routes by policy
  • SSH policies: SSH access control via grants
  • HuJSON parsing for policy files

For usage examples, read hscontrol/policy/v2/policy_test.go. For ACL reference documentation, see docs/.

Integration Testing

Before running any hi command, read cmd/hi/README.md in full. Guessing at hi flags leads to broken runs and stale containers.

Test-authoring patterns (EventuallyWithT, IntegrationSkip, helper variants, scenario setup) are documented in integration/README.md.

Key reminders:

  • Integration test functions must start with IntegrationSkip(t).
  • External calls (client.Status, headscale.ListNodes, etc.) belong inside EventuallyWithT; state-mutating commands (tailscale set) must not.
  • Tests generate ~100 MB of logs per run under control_logs/{runID}/. Prune old runs if disk is tight.
  • Flakes are almost always code, not infrastructure. Read hs-*.stderr.log before blaming Docker.

Code Conventions

  • Commit messages follow Go-style package: imperative description. Recent examples from git log:

    • db: scope DestroyUser to only delete the target user's pre-auth keys
    • state: fix policy change race in UpdateNodeFromMapRequest
    • integration: fix ACL tests for address-family-specific resolve

    Not Conventional Commits. No feat:/chore:/docs: prefixes.

  • Protobuf regeneration: changes under proto/ require make generate (which runs buf generate) and should land in a separate commit from the callers that use the regenerated types.

  • Formatting is enforced by golangci-lint with golines (width 88) and gofumpt. Run make fmt or rely on the pre-commit hook.

  • Logging uses zerolog. Prefer single-line chains (log.Info().Str(...).Msg(...)). For 4+ fields or conditional fields, build incrementally and reassign the event variable: e = e.Str("k", v). Forgetting to reassign silently drops the field.

  • Tests: prefer hscontrol/servertest/ for server-level tests that don't need Docker — faster than full integration tests.

Gotchas

  • Database: SQLite for local dev, PostgreSQL for integration-heavy tests (go run ./cmd/hi run "..." --postgres). Some race conditions only surface on one backend.
  • NodeStore writes rebuild a full snapshot. Measure before changing hot-path code.
  • .claude/agents/ is deprecated. Do not create new agent files there. Put behavioural guidance in this file and procedural guidance in the nearest README.
  • Do not edit gen/ — it is regenerated from proto/ by make generate.
  • Proto changes + code changes should be two commits, not one.