Procedural content moves to cmd/hi/README.md and integration/README.md. Stale references (poll.go:420, mapper/tail.go, notifier/, quality-control-enforcer, validateAndNormalizeTags) are corrected or removed.
12 KiB
AGENTS.md
Behavioural guidance for AI agents working in this repository. Reference
material for complex procedures lives next to the code — integration
testing is documented in cmd/hi/README.md and
integration/README.md. Read those files
before running tests or writing new ones.
Headscale is an open-source implementation of the Tailscale control server written in Go. It manages node registration, IP allocation, policy enforcement, and DERP routing for self-hosted tailnets.
Interaction Rules
These rules govern how you work in this repo. They are listed first because they shape every other decision.
Ask with comprehensive multiple-choice options
When you need to clarify intent, scope, or approach, use the
AskUserQuestion tool (or a numbered list fallback) and present the user
with a comprehensive set of options. Cover the likely branches explicitly
and include an "other — please describe" escape.
- Bad: "How should I handle expired nodes?"
- Good: "How should expired nodes be handled? (a) Remain visible to peers but marked expired (current behaviour); (b) Hidden from peers entirely; (c) Hidden from peers but visible in admin API; (d) Other."
This matters more than you think — open-ended questions waste a round trip and often produce a misaligned answer.
Read the documented procedure before running complex commands
Before invoking any hi command, integration test, generator, or
migration tool, read the referenced README in full —
cmd/hi/README.md for running tests, integration/README.md for
writing them. Never guess flags. If the procedure is not documented
anywhere, ask the user rather than inventing one.
Map once, then act
Use Glob / Grep to understand file structure, then execute. Do not
re-explore the same area to "double-check" once you have a plan. Do not
re-read files you edited in this session — the harness tracks state for
you.
Fail fast, report up
If a command fails twice with the same error, stop and report the exact error to the user with context. Do not loop through variants or "try one more thing". A repeated failure means your model of the problem is wrong.
Confirm scope for multi-file changes
Before touching more than three files, show the user which files will
change and why. Use plan mode (ExitPlanMode) for non-trivial work.
Prefer editing existing files
Do not create new files unless strictly necessary. Do not generate helper abstractions, wrapper utilities, or "just in case" configuration. Three similar lines of code is better than a premature abstraction.
Quick Start
# Enter the nix dev shell (Go 1.26.1, buf, golangci-lint, prek)
nix develop
# Full development workflow: fmt + lint + test + build
make dev
# Individual targets
make build # build the headscale binary
make test # go test ./...
make fmt # format Go, docs, proto
make lint # lint Go, proto
make generate # regenerate protobuf code (after changes to proto/)
make clean # remove build artefacts
# Direct go test invocations
go test ./...
go test -race ./...
# Integration tests — read cmd/hi/README.md first
go run ./cmd/hi doctor
go run ./cmd/hi run "TestName"
Go 1.26.1 minimum (per go.mod:3). nix develop pins the exact toolchain
used in CI.
Pre-Commit with prek
prek installs git hooks that run the same checks as CI.
nix develop
prek install # one-time setup
prek run # run hooks on staged files
prek run --all-files # run hooks on the full tree
Hooks cover: file hygiene (trailing whitespace, line endings, BOM),
syntax validation (JSON/YAML/TOML/XML), merge-conflict markers, private
key detection, nixpkgs-fmt, prettier, and golangci-lint via
--new-from-rev=HEAD~1 (see .pre-commit-config.yaml:59). A manual
invocation with an upstream/main remote is equivalent:
golangci-lint run --new-from-rev=upstream/main --timeout=5m --fix
git commit --no-verify is acceptable only for WIP commits on feature
branches — never on main.
Project Layout
headscale/
├── cmd/
│ ├── headscale/ # Main headscale server binary
│ └── hi/ # Integration test runner (see cmd/hi/README.md)
├── hscontrol/ # Core control plane
├── integration/ # End-to-end Docker-based tests (see integration/README.md)
├── proto/ # Protocol buffer definitions
├── gen/ # Generated code (buf output — do not edit)
├── docs/ # User and ACL reference documentation
└── packaging/ # Distribution packaging
hscontrol/ packages
app.go,handlers.go,grpcv1.go,noise.go,auth.go,oidc.go,poll.go,metrics.go,debug.go,tailsql.go,platform_config.go— top-level server filesstate/— central coordinator (state.go) and the copy-on-writeNodeStore(node_store.go). All cross-subsystem operations go throughState.db/— GORM layer, migrations, schema.node.go,users.go,api_key.go,preauth_keys.go,ip.go,policy.go.mapper/— streaming batcher that distributes MapResponses to clients:batcher.go,node_conn.go,builder.go,mapper.go. Performance-critical.policy/—policy/v2/is the policy implementation. The top-levelpolicy.gois thin wrappers. There is no v1 directory.routes/,dns/,derp/,types/,util/,templates/,capver/— routing, MagicDNS, relay, core types, helpers, client templates, capability versioning.servertest/— in-memory test harness for server-level tests that don't need Docker. Prefer this overintegration/when possible.assets/— embedded UI assets.
cmd/hi/ files
main.go, run.go, doctor.go, docker.go, cleanup.go, stats.go,
README.md. Read cmd/hi/README.md before running any hi command.
Architecture Essentials
hscontrol/state/state.gois the central coordinator. Cross-cutting operations (node updates, policy evaluation, IP allocation) go through theStatetype, not directly to the database.NodeStoreinhscontrol/state/node_store.gois a copy-on-write in-memory cache backed byatomic.Pointer[Snapshot]. Every read is a pointer load; writes rebuild a new snapshot and atomically swap. It is the hot path forMapRequestprocessing and peer visibility.- The map-request sync point is
State.UpdateNodeFromMapRequest()inhscontrol/state/state.go:2351. This is where Hostinfo changes, endpoint updates, and route advertisements land in the NodeStore. - Mapper subsystem streams MapResponses via
batcher.goandnode_conn.go. Changes here affect all connected clients. - Node registration flow: noise handshake (
noise.go) → auth (auth.go) → state/DB persistence (state/,db/) → initial map (mapper/).
Database Migration Rules
These rules are load-bearing — violating them corrupts production
databases. The migrationsRequiringFKDisabled map in
hscontrol/db/db.go:962 is frozen as of 2025-07-02 (see the comment at
db.go:989). All new migrations must:
- Never reorder existing migrations. Migration order is immutable once committed.
- Only add new migrations to the end of the migrations array.
- Never disable foreign keys. No new entries in
migrationsRequiringFKDisabled. - Use the migration ID format
YYYYMMDDHHMM-short-description(timestamp + descriptive suffix). Example:202602201200-clear-tagged-node-user-id. - Never rename columns that later migrations reference. Let
AutoMigratecreate a new column if needed.
Tags-as-Identity
Headscale enforces tags XOR user ownership: every node is either tagged (owned by tags) or user-owned (owned by a user namespace), never both. This is a load-bearing architectural invariant.
- Use
node.IsTagged()(hscontrol/types/node.go:221) to determine ownership, notnode.UserID().Valid(). A tagged node may still haveUserIDset for "created by" tracking —IsTagged()is authoritative. IsUserOwned()(node.go:227) returns!IsTagged().- Tagged nodes are presented to Tailscale as the special
TaggedDevicesuser (hscontrol/types/users.go, ID2147455555). SetTagsvalidation is enforced byvalidateNodeOwnership()inhscontrol/state/tags.go.- Examples and edge cases live in
hscontrol/types/node_tags_test.goandhscontrol/grpcv1_test.go(TestSetTags_*).
Don't do this:
if node.UserID().Valid() { /* assume user-owned */ } // WRONG
if node.UserID().Valid() && !node.IsTagged() { /* ok */ } // correct
Policy Engine
hscontrol/policy/v2/policy.go is the policy implementation. The
top-level hscontrol/policy/policy.go contains only wrapper functions
around v2. There is no v1 directory.
Key concepts an agent will encounter:
- Autogroups:
autogroup:self,autogroup:member,autogroup:internet - Tag owners: IP-based authorization for who can claim a tag
- Route approvals: auto-approval of subnet routes by policy
- SSH policies: SSH access control via grants
- HuJSON parsing for policy files
For usage examples, read hscontrol/policy/v2/policy_test.go. For ACL
reference documentation, see docs/.
Integration Testing
Before running any hi command, read cmd/hi/README.md in full.
Guessing at hi flags leads to broken runs and stale containers.
Test-authoring patterns (EventuallyWithT, IntegrationSkip, helper
variants, scenario setup) are documented in integration/README.md.
Key reminders:
- Integration test functions must start with
IntegrationSkip(t). - External calls (
client.Status,headscale.ListNodes, etc.) belong insideEventuallyWithT; state-mutating commands (tailscale set) must not. - Tests generate ~100 MB of logs per run under
control_logs/{runID}/. Prune old runs if disk is tight. - Flakes are almost always code, not infrastructure. Read
hs-*.stderr.logbefore blaming Docker.
Code Conventions
-
Commit messages follow Go-style
package: imperative description. Recent examples fromgit log:db: scope DestroyUser to only delete the target user's pre-auth keysstate: fix policy change race in UpdateNodeFromMapRequestintegration: fix ACL tests for address-family-specific resolve
Not Conventional Commits. No
feat:/chore:/docs:prefixes. -
Protobuf regeneration: changes under
proto/requiremake generate(which runsbuf generate) and should land in a separate commit from the callers that use the regenerated types. -
Formatting is enforced by
golangci-lintwithgolines(width 88) andgofumpt. Runmake fmtor rely on the pre-commit hook. -
Logging uses
zerolog. Prefer single-line chains (log.Info().Str(...).Msg(...)). For 4+ fields or conditional fields, build incrementally and reassign the event variable:e = e.Str("k", v). Forgetting to reassign silently drops the field. -
Tests: prefer
hscontrol/servertest/for server-level tests that don't need Docker — faster than full integration tests.
Gotchas
- Database: SQLite for local dev, PostgreSQL for integration-heavy
tests (
go run ./cmd/hi run "..." --postgres). Some race conditions only surface on one backend. - NodeStore writes rebuild a full snapshot. Measure before changing hot-path code.
.claude/agents/is deprecated. Do not create new agent files there. Put behavioural guidance in this file and procedural guidance in the nearest README.- Do not edit
gen/— it is regenerated fromproto/bymake generate. - Proto changes + code changes should be two commits, not one.