tags: process tags on registration, simplify policy (#2931)

This PR investigates, adds tests and aims to correctly implement Tailscale's model for how Tags should be accepted, assigned and used to identify nodes in the Tailscale access and ownership model. When evaluating in Headscale's policy, Tags are now only checked against a nodes "tags" list, which defines the source of truth for all tags for a given node. This simplifies the code for dealing with tags greatly, and should help us have less access bugs related to nodes belonging to tags or users. A node can either be owned by a user, or a tag. Next, to ensure the tags list on the node is correctly implemented, we first add tests for every registration scenario and combination of user, pre auth key and pre auth key with tags with the same registration expectation as observed by trying them all with the Tailscale control server. This should ensure that we implement the correct behaviour and that it does not change or break over time. Lastly, the missing parts of the auth has been added, or changed in the cases where it was wrong. This has in large parts allowed us to delete and simplify a lot of code. Now, tags can only be changed when a node authenticates or if set via the CLI/API. Tags can only be fully overwritten/replaced and any use of either auth or CLI will replace the current set if different. A user owned device can be converted to a tagged device, but it cannot be changed back. A tagged device can never remove the last tag either, it has to have a minimum of one.
2026-04-22 08:38:39 +02:00 · 2025-12-08 18:51:07 +01:00
parent 1f5df017a1
commit 22ee2bfc9c
24 changed files with 3414 additions and 1001 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -411,7 +411,46 @@ go run ./cmd/hi run "TestPattern*"

 - Only ONE test can run at a time (Docker port conflicts)
 - Tests generate ~100MB of logs per run in `control_logs/`
- Clean environment before each test: `rm -rf control_logs/202507* && docker system prune -f`
+- Clean environment before each test: `sudo rm -rf control_logs/202* && docker system prune -f`
+
+### Full Matrix Testing
+
+Some integration tests support **full matrix mode** that tests all combinations of test dimensions. This is critical for comprehensive validation but can take up to 2 hours to complete.
+
+**Example: TestAutoApproveMultiNetwork Full Matrix**
+
+```bash
+# Set GOPATH to avoid environment issues
+export GOPATH=$HOME/go
+
+# Enable full matrix mode and run with generous timeout
+HEADSCALE_INTEGRATION_FULL_MATRIX=1 go run ./cmd/hi run "TestAutoApproveMultiNetwork" --timeout=7200s
+```
+
+**Full Matrix Dimensions:**
+- **Base scenarios (6):** All combinations of:
+  - Auth methods: `authkey`, `webauth`
+  - Approver types: `tag`, `user`, `group`
+- **Policy modes (2):** `database`, `file`
+- **Advertisement timing (2):** `advertiseduringup-true`, `advertiseduringup-false`
+- **Total combinations:** 6 × 2 × 2 = **24 tests**
+
+**Default (minimal) mode:** Runs only 3 representative tests covering all dimensions:
+- `authkey-tag-advertiseduringup-false-pol-database`
+- `webauth-user-advertiseduringup-true-pol-file`
+- `authkey-group-advertiseduringup-false-pol-file`
+
+**Full Matrix Requirements:**
+- **Time:** Up to 2 hours for complete execution
+- **Disk space:** ~2-3GB for all test artifacts
+- **Environment:** Clean Docker state before starting
+- **Timeout:** Use `--timeout=7200s` (2 hours) minimum
+
+**When to use full matrix:**
+- Before major releases or merges to main
+- After changes to route management, ACL evaluation, or policy engine
+- When debugging flaky tests or cross-scenario issues
+- For comprehensive validation of tags-as-identity changes

 ### Test Artifacts Location