flake.lock: Update

Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/608d0ca' (2026-03-08) → 'github:NixOS/nixpkgs/8d8c1fa' (2026-04-02)
Use pymdownx.magiclink with its default configuration
2026-04-11 19:39:48 +02:00 · 2026-04-05 00:36:57 +00:00 · 2026-04-02 21:24:27 +02:00 · 2026-04-02 21:24:27 +02:00 · 2026-04-01 14:10:42 +01:00 · 2026-04-01 14:10:42 +01:00
1170 changed files with 221512 additions and 26277 deletions
--- a/.claude/agents/headscale-integration-tester.md
+++ b/.claude/agents/headscale-integration-tester.md
@@ -0,0 +1,870 @@
+---
+name: headscale-integration-tester
+description: Use this agent when you need to execute, analyze, or troubleshoot Headscale integration tests. This includes running specific test scenarios, investigating test failures, interpreting test artifacts, validating end-to-end functionality, or ensuring integration test quality before releases. Examples: <example>Context: User has made changes to the route management code and wants to validate the changes work correctly. user: 'I've updated the route advertisement logic in poll.go. Can you run the relevant integration tests to make sure everything still works?' assistant: 'I'll use the headscale-integration-tester agent to run the subnet routing integration tests and analyze the results.' <commentary>Since the user wants to validate route-related changes with integration tests, use the headscale-integration-tester agent to execute the appropriate tests and analyze results.</commentary></example> <example>Context: A CI pipeline integration test is failing and the user needs help understanding why. user: 'The TestSubnetRouterMultiNetwork test is failing in CI. The logs show some timing issues but I can't figure out what's wrong.' assistant: 'Let me use the headscale-integration-tester agent to analyze the test failure and examine the artifacts.' <commentary>Since this involves analyzing integration test failures and interpreting test artifacts, use the headscale-integration-tester agent to investigate the issue.</commentary></example>
+color: green
+---
+
+You are a specialist Quality Assurance Engineer with deep expertise in Headscale's integration testing system. You understand the Docker-based test infrastructure, real Tailscale client interactions, and the complex timing considerations involved in end-to-end network testing.
+
+## Integration Test System Overview
+
+The Headscale integration test system uses Docker containers running real Tailscale clients against a Headscale server. Tests validate end-to-end functionality including routing, ACLs, node lifecycle, and network coordination. The system is built around the `hi` (Headscale Integration) test runner in `cmd/hi/`.
+
+## Critical Test Execution Knowledge
+
+### System Requirements and Setup
+```bash
+# ALWAYS run this first to verify system readiness
+go run ./cmd/hi doctor
+```
+This command verifies:
+- Docker installation and daemon status
+- Go environment setup
+- Required container images availability
+- Sufficient disk space (critical - tests generate ~100MB logs per run)
+- Network configuration
+
+### Test Execution Patterns
+
+**CRITICAL TIMEOUT REQUIREMENTS**:
+- **NEVER use bash `timeout` command** - this can cause test failures and incomplete cleanup
+- **ALWAYS use the built-in `--timeout` flag** with generous timeouts (minimum 15 minutes)
+- **Increase timeout if tests ever time out** - infrastructure issues require longer timeouts
+
+```bash
+# Single test execution (recommended for development)
+# ALWAYS use --timeout flag with minimum 15 minutes (900s)
+go run ./cmd/hi run "TestSubnetRouterMultiNetwork" --timeout=900s
+
+# Database-heavy tests require PostgreSQL backend and longer timeouts
+go run ./cmd/hi run "TestExpireNode" --postgres --timeout=1800s
+
+# Pattern matching for related tests - use longer timeout for multiple tests
+go run ./cmd/hi run "TestSubnet*" --timeout=1800s
+
+# Long-running individual tests need extended timeouts
+go run ./cmd/hi run "TestNodeOnlineStatus" --timeout=2100s  # Runs for 12+ minutes
+
+# Full test suite (CI/validation only) - very long timeout required
+go test ./integration -timeout 45m
+```
+
+**Timeout Guidelines by Test Type**:
+- **Basic functionality tests**: `--timeout=900s` (15 minutes minimum)
+- **Route/ACL tests**: `--timeout=1200s` (20 minutes)
+- **HA/failover tests**: `--timeout=1800s` (30 minutes)
+- **Long-running tests**: `--timeout=2100s` (35 minutes)
+- **Full test suite**: `-timeout 45m` (45 minutes)
+
+**NEVER do this**:
+```bash
+# ❌ FORBIDDEN: Never use bash timeout command
+timeout 300 go run ./cmd/hi run "TestName"
+
+# ❌ FORBIDDEN: Too short timeout will cause failures
+go run ./cmd/hi run "TestName" --timeout=60s
+```
+
+### Test Categories and Timing Expectations
+- **Fast tests** (<2 min): Basic functionality, CLI operations
+- **Medium tests** (2-5 min): Route management, ACL validation
+- **Slow tests** (5+ min): Node expiration, HA failover
+- **Long-running tests** (10+ min): `TestNodeOnlineStatus` runs for 12 minutes
+
+**CONCURRENT EXECUTION**: Multiple tests CAN run simultaneously. Each test run gets a unique Run ID for isolation. See "Concurrent Execution and Run ID Isolation" section below.
+
+## Test Artifacts and Log Analysis
+
+### Artifact Structure
+All test runs save comprehensive artifacts to `control_logs/TIMESTAMP-ID/`:
+```
+control_logs/20250713-213106-iajsux/
+├── hs-testname-abc123.stderr.log     # Headscale server error logs
+├── hs-testname-abc123.stdout.log     # Headscale server output logs
+├── hs-testname-abc123.db             # Database snapshot for post-mortem
+├── hs-testname-abc123_metrics.txt    # Prometheus metrics dump
+├── hs-testname-abc123-mapresponses/  # Protocol-level debug data
+├── ts-client-xyz789.stderr.log       # Tailscale client error logs
+├── ts-client-xyz789.stdout.log       # Tailscale client output logs
+└── ts-client-xyz789_status.json      # Client network status dump
+```
+
+### Log Analysis Priority Order
+When tests fail, examine artifacts in this specific order:
+
+1. **Headscale server stderr logs** (`hs-*.stderr.log`): Look for errors, panics, database issues, policy evaluation failures
+2. **Tailscale client stderr logs** (`ts-*.stderr.log`): Check for authentication failures, network connectivity issues
+3. **MapResponse JSON files**: Protocol-level debugging for network map generation issues
+4. **Client status dumps** (`*_status.json`): Network state and peer connectivity information
+5. **Database snapshots** (`.db` files): For data consistency and state persistence issues
+
+## Concurrent Execution and Run ID Isolation
+
+### Overview
+
+The integration test system supports running multiple tests concurrently on the same Docker daemon. Each test run is isolated through a unique Run ID that ensures containers, networks, and cleanup operations don't interfere with each other.
+
+### Run ID Format and Usage
+
+Each test run generates a unique Run ID in the format: `YYYYMMDD-HHMMSS-{6-char-hash}`
+- Example: `20260109-104215-mdjtzx`
+
+The Run ID is used for:
+- **Container naming**: `ts-{runIDShort}-{version}-{hash}` (e.g., `ts-mdjtzx-1-74-fgdyls`)
+- **Docker labels**: All containers get `hi.run-id={runID}` label
+- **Log directories**: `control_logs/{runID}/`
+- **Cleanup isolation**: Only containers with matching run ID are cleaned up
+
+### Container Isolation Mechanisms
+
+1. **Unique Container Names**: Each container includes the run ID for identification
+2. **Docker Labels**: `hi.run-id` and `hi.test-type` labels on all containers
+3. **Dynamic Port Allocation**: All ports use `{HostPort: "0"}` to let kernel assign free ports
+4. **Per-Run Networks**: Network names include scenario hash for isolation
+5. **Isolated Cleanup**: `killTestContainersByRunID()` only removes containers matching the run ID
+
+### ⚠️ CRITICAL: Never Interfere with Other Test Runs
+
+**FORBIDDEN OPERATIONS** when other tests may be running:
+
+```bash
+# ❌ NEVER do global container cleanup while tests are running
+docker rm -f $(docker ps -q --filter "name=hs-")
+docker rm -f $(docker ps -q --filter "name=ts-")
+
+# ❌ NEVER kill all test containers
+# This will destroy other agents' test sessions!
+
+# ❌ NEVER prune all Docker resources during active tests
+docker system prune -f  # Only safe when NO tests are running
+```
+
+**SAFE OPERATIONS**:
+
+```bash
+# ✅ Clean up only YOUR test run's containers (by run ID)
+# The test runner does this automatically via cleanup functions
+
+# ✅ Clean stale (stopped/exited) containers only
+# Pre-test cleanup only removes stopped containers, not running ones
+
+# ✅ Check what's running before cleanup
+docker ps --filter "name=headscale-test-suite" --format "{{.Names}}"
+```
+
+### Running Concurrent Tests
+
+```bash
+# Start multiple tests in parallel - each gets unique run ID
+go run ./cmd/hi run "TestPingAllByIP" &
+go run ./cmd/hi run "TestACLAllowUserDst" &
+go run ./cmd/hi run "TestOIDCAuthenticationPingAll" &
+
+# Monitor running test suites
+docker ps --filter "name=headscale-test-suite" --format "table {{.Names}}\t{{.Status}}"
+```
+
+### Agent Session Isolation Rules
+
+When working as an agent:
+
+1. **Your run ID is unique**: Each test you start gets its own run ID
+2. **Never clean up globally**: Only use run ID-specific cleanup
+3. **Check before cleanup**: Verify no other tests are running if you need to prune resources
+4. **Respect other sessions**: Other agents may have tests running concurrently
+5. **Log directories are isolated**: Your artifacts are in `control_logs/{your-run-id}/`
+
+### Identifying Your Containers
+
+Your test containers can be identified by:
+- The run ID in the container name
+- The `hi.run-id` Docker label
+- The test suite container: `headscale-test-suite-{your-run-id}`
+
+```bash
+# List containers for a specific run ID
+docker ps --filter "label=hi.run-id=20260109-104215-mdjtzx"
+
+# Get your run ID from the test output
+# Look for: "Run ID: 20260109-104215-mdjtzx"
+```
+
+## Common Failure Patterns and Root Cause Analysis
+
+### CRITICAL MINDSET: Code Issues vs Infrastructure Issues
+
+**⚠️ IMPORTANT**: When tests fail, it is ALMOST ALWAYS a code issue with Headscale, NOT infrastructure problems. Do not immediately blame disk space, Docker issues, or timing unless you have thoroughly investigated the actual error logs first.
+
+### Systematic Debugging Process
+
+1. **Read the actual error message**: Don't assume - read the stderr logs completely
+2. **Check Headscale server logs first**: Most issues originate from server-side logic
+3. **Verify client connectivity**: Only after ruling out server issues
+4. **Check timing patterns**: Use proper `EventuallyWithT` patterns
+5. **Infrastructure as last resort**: Only blame infrastructure after code analysis
+
+### Real Failure Patterns
+
+#### 1. Timing Issues (Common but fixable)
+```go
+// ❌ Wrong: Immediate assertions after async operations
+client.Execute([]string{"tailscale", "set", "--advertise-routes=10.0.0.0/24"})
+nodes, _ := headscale.ListNodes()
+require.Len(t, nodes[0].GetAvailableRoutes(), 1) // WILL FAIL
+
+// ✅ Correct: Wait for async operations
+client.Execute([]string{"tailscale", "set", "--advertise-routes=10.0.0.0/24"})
+require.EventuallyWithT(t, func(c *assert.CollectT) {
+    nodes, err := headscale.ListNodes()
+    assert.NoError(c, err)
+    assert.Len(c, nodes[0].GetAvailableRoutes(), 1)
+}, 10*time.Second, 100*time.Millisecond, "route should be advertised")
+```
+
+**Timeout Guidelines**:
+- Route operations: 3-5 seconds
+- Node state changes: 5-10 seconds
+- Complex scenarios: 10-15 seconds
+- Policy recalculation: 5-10 seconds
+
+#### 2. NodeStore Synchronization Issues
+Route advertisements must propagate through poll requests (`poll.go:420`). NodeStore updates happen at specific synchronization points after Hostinfo changes.
+
+#### 3. Test Data Management Issues
+```go
+// ❌ Wrong: Assuming array ordering
+require.Len(t, nodes[0].GetAvailableRoutes(), 1)
+
+// ✅ Correct: Identify nodes by properties
+expectedRoutes := map[string]string{"1": "10.33.0.0/16"}
+for _, node := range nodes {
+    nodeIDStr := fmt.Sprintf("%d", node.GetId())
+    if route, shouldHaveRoute := expectedRoutes[nodeIDStr]; shouldHaveRoute {
+        // Test the specific node that should have the route
+    }
+}
+```
+
+#### 4. Database Backend Differences
+SQLite vs PostgreSQL have different timing characteristics:
+- Use `--postgres` flag for database-intensive tests
+- PostgreSQL generally has more consistent timing
+- Some race conditions only appear with specific backends
+
+## Resource Management and Cleanup
+
+### Disk Space Management
+Tests consume significant disk space (~100MB per run):
+```bash
+# Check available space before running tests
+df -h
+
+# Clean up test artifacts periodically
+rm -rf control_logs/older-timestamp-dirs/
+
+# Clean Docker resources
+docker system prune -f
+docker volume prune -f
+```
+
+### Container Cleanup
+- Successful tests clean up automatically
+- Failed tests may leave containers running
+- Manually clean if needed: `docker ps -a` and `docker rm -f <containers>`
+
+## Advanced Debugging Techniques
+
+### Protocol-Level Debugging
+MapResponse JSON files in `control_logs/*/hs-*-mapresponses/` contain:
+- Network topology as sent to clients
+- Peer relationships and visibility
+- Route distribution and primary route selection
+- Policy evaluation results
+
+### Database State Analysis
+Use the database snapshots for post-mortem analysis:
+```bash
+# SQLite examination
+sqlite3 control_logs/TIMESTAMP/hs-*.db
+.tables
+.schema nodes
+SELECT * FROM nodes WHERE name LIKE '%problematic%';
+```
+
+### Performance Analysis
+Prometheus metrics dumps show:
+- Request latencies and error rates
+- NodeStore operation timing
+- Database query performance
+- Memory usage patterns
+
+## Test Development and Quality Guidelines
+
+### Proper Test Patterns
+```go
+// Always use EventuallyWithT for async operations
+require.EventuallyWithT(t, func(c *assert.CollectT) {
+    // Test condition that may take time to become true
+}, timeout, interval, "descriptive failure message")
+
+// Handle node identification correctly
+var targetNode *v1.Node
+for _, node := range nodes {
+    if node.GetName() == expectedNodeName {
+        targetNode = node
+        break
+    }
+}
+require.NotNil(t, targetNode, "should find expected node")
+```
+
+### Quality Validation Checklist
+- ✅ Tests use `EventuallyWithT` for asynchronous operations
+- ✅ Tests don't rely on array ordering for node identification
+- ✅ Proper cleanup and resource management
+- ✅ Tests handle both success and failure scenarios
+- ✅ Timing assumptions are realistic for operations being tested
+- ✅ Error messages are descriptive and actionable
+
+## Real-World Test Failure Patterns from HA Debugging
+
+### Infrastructure vs Code Issues - Detailed Examples
+
+**INFRASTRUCTURE FAILURES (Rare but Real)**:
+1. **DNS Resolution in Auth Tests**: `failed to resolve "hs-pingallbyip-jax97k": no DNS fallback candidates remain`
+   - **Pattern**: Client containers can't resolve headscale server hostname during logout
+   - **Detection**: Error messages specifically mention DNS/hostname resolution
+   - **Solution**: Docker networking reset, not code changes
+
+2. **Container Creation Timeouts**: Test gets stuck during client container setup
+   - **Pattern**: Tests hang indefinitely at container startup phase
+   - **Detection**: No progress in logs for >2 minutes during initialization
+   - **Solution**: `docker system prune -f` and retry
+
+3. **Docker Resource Exhaustion**: Too many concurrent tests overwhelming system
+   - **Pattern**: Container creation timeouts, OOM kills, slow test execution
+   - **Detection**: System load high, Docker daemon slow to respond
+   - **Solution**: Reduce number of concurrent tests, wait for completion before starting more
+
+**CODE ISSUES (99% of failures)**:
+1. **Route Approval Process Failures**: Routes not getting approved when they should be
+   - **Pattern**: Tests expecting approved routes but finding none
+   - **Detection**: `SubnetRoutes()` returns empty when `AnnouncedRoutes()` shows routes
+   - **Root Cause**: Auto-approval logic bugs, policy evaluation issues
+
+2. **NodeStore Synchronization Issues**: State updates not propagating correctly
+   - **Pattern**: Route changes not reflected in NodeStore or Primary Routes
+   - **Detection**: Logs show route announcements but no tracking updates
+   - **Root Cause**: Missing synchronization points in `poll.go:420` area
+
+3. **HA Failover Architecture Issues**: Routes removed when nodes go offline
+   - **Pattern**: `TestHASubnetRouterFailover` fails because approved routes disappear
+   - **Detection**: Routes available on online nodes but lost when nodes disconnect
+   - **Root Cause**: Conflating route approval with node connectivity
+
+### Critical Test Environment Setup
+
+**Pre-Test Cleanup**:
+
+The test runner automatically handles cleanup:
+- **Before test**: Removes only stale (stopped/exited) containers - does NOT affect running tests
+- **After test**: Removes only containers belonging to the specific run ID
+
+```bash
+# Only clean old log directories if disk space is low
+rm -rf control_logs/202507*
+df -h  # Verify sufficient disk space
+
+# SAFE: Clean only stale/stopped containers (does not affect running tests)
+# The test runner does this automatically via cleanupStaleTestContainers()
+
+# ⚠️ DANGEROUS: Only use when NO tests are running
+docker system prune -f
+```
+
+**Environment Verification**:
+```bash
+# Verify system readiness
+go run ./cmd/hi doctor
+
+# Check what tests are currently running (ALWAYS check before global cleanup)
+docker ps --filter "name=headscale-test-suite" --format "{{.Names}}"
+```
+
+### Specific Test Categories and Known Issues
+
+#### Route-Related Tests (Primary Focus)
+```bash
+# Core route functionality - these should work first
+# Note: Generous timeouts are required for reliable execution
+go run ./cmd/hi run "TestSubnetRouteACL" --timeout=1200s
+go run ./cmd/hi run "TestAutoApproveMultiNetwork" --timeout=1800s
+go run ./cmd/hi run "TestHASubnetRouterFailover" --timeout=1800s
+```
+
+**Common Route Test Patterns**:
+- Tests validate route announcement, approval, and distribution workflows
+- Route state changes are asynchronous - may need `EventuallyWithT` wrappers
+- Route approval must respect ACL policies - test expectations encode security requirements
+- HA tests verify route persistence during node connectivity changes
+
+#### Authentication Tests (Infrastructure-Prone)
+```bash
+# These tests are more prone to infrastructure issues
+# Require longer timeouts due to auth flow complexity
+go run ./cmd/hi run "TestAuthKeyLogoutAndReloginSameUser" --timeout=1200s
+go run ./cmd/hi run "TestAuthWebFlowLogoutAndRelogin" --timeout=1200s
+go run ./cmd/hi run "TestOIDCExpireNodesBasedOnTokenExpiry" --timeout=1800s
+```
+
+**Common Auth Test Infrastructure Failures**:
+- DNS resolution during logout operations
+- Container creation timeouts
+- HTTP/2 stream errors (often symptoms, not root cause)
+
+### Security-Critical Debugging Rules
+
+**❌ FORBIDDEN CHANGES (Security & Test Integrity)**:
+1. **Never change expected test outputs** - Tests define correct behavior contracts
+   - Changing `require.Len(t, routes, 3)` to `require.Len(t, routes, 2)` because test fails
+   - Modifying expected status codes, node counts, or route counts
+   - Removing assertions that are "inconvenient"
+   - **Why forbidden**: Test expectations encode business requirements and security policies
+
+2. **Never bypass security mechanisms** - Security must never be compromised for convenience
+   - Using `AnnouncedRoutes()` instead of `SubnetRoutes()` in production code
+   - Skipping authentication or authorization checks
+   - **Why forbidden**: Security bypasses create vulnerabilities in production
+
+3. **Never reduce test coverage** - Tests prevent regressions
+   - Removing test cases or assertions
+   - Commenting out "problematic" test sections
+   - **Why forbidden**: Reduced coverage allows bugs to slip through
+
+**✅ ALLOWED CHANGES (Timing & Observability)**:
+1. **Fix timing issues with proper async patterns**
+   ```go
+   // ✅ GOOD: Add EventuallyWithT for async operations
+   require.EventuallyWithT(t, func(c *assert.CollectT) {
+       nodes, err := headscale.ListNodes()
+       assert.NoError(c, err)
+       assert.Len(c, nodes, expectedCount) // Keep original expectation
+   }, 10*time.Second, 100*time.Millisecond, "nodes should reach expected count")
+   ```
+   - **Why allowed**: Fixes race conditions without changing business logic
+
+2. **Add MORE observability and debugging**
+   - Additional logging statements
+   - More detailed error messages
+   - Extra assertions that verify intermediate states
+   - **Why allowed**: Better observability helps debug without changing behavior
+
+3. **Improve test documentation**
+   - Add godoc comments explaining test purpose and business logic
+   - Document timing requirements and async behavior
+   - **Why encouraged**: Helps future maintainers understand intent
+
+### Advanced Debugging Workflows
+
+#### Route Tracking Debug Flow
+```bash
+# Run test with detailed logging and proper timeout
+go run ./cmd/hi run "TestSubnetRouteACL" --timeout=1200s > test_output.log 2>&1
+
+# Check route approval process
+grep -E "(auto-approval|ApproveRoutesWithPolicy|PolicyManager)" test_output.log
+
+# Check route tracking
+tail -50 control_logs/*/hs-*.stderr.log | grep -E "(announced|tracking|SetNodeRoutes)"
+
+# Check for security violations
+grep -E "(AnnouncedRoutes.*SetNodeRoutes|bypass.*approval)" test_output.log
+```
+
+#### HA Failover Debug Flow
+```bash
+# Test HA failover specifically with adequate timeout
+go run ./cmd/hi run "TestHASubnetRouterFailover" --timeout=1800s
+
+# Check route persistence during disconnect
+grep -E "(Disconnect|NodeWentOffline|PrimaryRoutes)" control_logs/*/hs-*.stderr.log
+
+# Verify routes don't disappear inappropriately
+grep -E "(removing.*routes|SetNodeRoutes.*empty)" control_logs/*/hs-*.stderr.log
+```
+
+### Test Result Interpretation Guidelines
+
+#### Success Patterns to Look For
+- `"updating node routes for tracking"` in logs
+- Routes appearing in `announcedRoutes` logs
+- Proper `ApproveRoutesWithPolicy` calls for auto-approval
+- Routes persisting through node connectivity changes (HA tests)
+
+#### Failure Patterns to Investigate
+- `SubnetRoutes()` returning empty when `AnnouncedRoutes()` has routes
+- Routes disappearing when nodes go offline (HA architectural issue)
+- Missing `EventuallyWithT` causing timing race conditions
+- Security bypass attempts using wrong route methods
+
+### Critical Testing Methodology
+
+**Phase-Based Testing Approach**:
+1. **Phase 1**: Core route tests (ACL, auto-approval, basic functionality)
+2. **Phase 2**: HA and complex route scenarios
+3. **Phase 3**: Auth tests (infrastructure-sensitive, test last)
+
+**Per-Test Process**:
+1. Clean environment before each test
+2. Monitor logs for route tracking and approval messages
+3. Check artifacts in `control_logs/` if test fails
+4. Focus on actual error messages, not assumptions
+5. Document results and patterns discovered
+
+## Test Documentation and Code Quality Standards
+
+### Adding Missing Test Documentation
+When you understand a test's purpose through debugging, always add comprehensive godoc:
+
+```go
+// TestSubnetRoutes validates the complete subnet route lifecycle including
+// advertisement from clients, policy-based approval, and distribution to peers.
+// This test ensures that route security policies are properly enforced and that
+// only approved routes are distributed to the network.
+//
+// The test verifies:
+// - Route announcements are received and tracked
+// - ACL policies control route approval correctly
+// - Only approved routes appear in peer network maps
+// - Route state persists correctly in the database
+func TestSubnetRoutes(t *testing.T) {
+    // Test implementation...
+}
+```
+
+**Why add documentation**: Future maintainers need to understand business logic and security requirements encoded in tests.
+
+### Comment Guidelines - Focus on WHY, Not WHAT
+
+```go
+// ✅ GOOD: Explains reasoning and business logic
+// Wait for route propagation because NodeStore updates are asynchronous
+// and happen after poll requests complete processing
+require.EventuallyWithT(t, func(c *assert.CollectT) {
+    // Check that security policies are enforced...
+}, timeout, interval, "route approval must respect ACL policies")
+
+// ❌ BAD: Just describes what the code does
+// Wait for routes
+require.EventuallyWithT(t, func(c *assert.CollectT) {
+    // Get routes and check length
+}, timeout, interval, "checking routes")
+```
+
+**Why focus on WHY**: Helps maintainers understand architectural decisions and security requirements.
+
+## EventuallyWithT Pattern for External Calls
+
+### Overview
+EventuallyWithT is a testing pattern used to handle eventual consistency in distributed systems. In Headscale integration tests, many operations are asynchronous - clients advertise routes, the server processes them, updates propagate through the network. EventuallyWithT allows tests to wait for these operations to complete while making assertions.
+
+### External Calls That Must Be Wrapped
+The following operations are **external calls** that interact with the headscale server or tailscale clients and MUST be wrapped in EventuallyWithT:
+- `headscale.ListNodes()` - Queries server state
+- `client.Status()` - Gets client network status
+- `client.Curl()` - Makes HTTP requests through the network
+- `client.Traceroute()` - Performs network diagnostics
+- `client.Execute()` when running commands that query state
+- Any operation that reads from the headscale server or tailscale client
+
+### Five Key Rules for EventuallyWithT
+
+1. **One External Call Per EventuallyWithT Block**
+   - Each EventuallyWithT should make ONE external call (e.g., ListNodes OR Status)
+   - Related assertions based on that single call can be grouped together
+   - Unrelated external calls must be in separate EventuallyWithT blocks
+
+2. **Variable Scoping**
+   - Declare variables that need to be shared across EventuallyWithT blocks at function scope
+   - Use `=` for assignment inside EventuallyWithT, not `:=` (unless the variable is only used within that block)
+   - Variables declared with `:=` inside EventuallyWithT are not accessible outside
+
+3. **No Nested EventuallyWithT**
+   - NEVER put an EventuallyWithT inside another EventuallyWithT
+   - This is a critical anti-pattern that must be avoided
+
+4. **Use CollectT for Assertions**
+   - Inside EventuallyWithT, use `assert` methods with the CollectT parameter
+   - Helper functions called within EventuallyWithT must accept `*assert.CollectT`
+
+5. **Descriptive Messages**
+   - Always provide a descriptive message as the last parameter
+   - Message should explain what condition is being waited for
+
+### Correct Pattern Examples
+
+```go
+// CORRECT: Single external call with related assertions
+var nodes []*v1.Node
+var err error
+
+assert.EventuallyWithT(t, func(c *assert.CollectT) {
+    nodes, err = headscale.ListNodes()
+    assert.NoError(c, err)
+    assert.Len(c, nodes, 2)
+    // These assertions are all based on the ListNodes() call
+    requireNodeRouteCountWithCollect(c, nodes[0], 2, 2, 2)
+    requireNodeRouteCountWithCollect(c, nodes[1], 1, 1, 1)
+}, 10*time.Second, 500*time.Millisecond, "nodes should have expected route counts")
+
+// CORRECT: Separate EventuallyWithT for different external call
+assert.EventuallyWithT(t, func(c *assert.CollectT) {
+    status, err := client.Status()
+    assert.NoError(c, err)
+    // All these assertions are based on the single Status() call
+    for _, peerKey := range status.Peers() {
+        peerStatus := status.Peer[peerKey]
+        requirePeerSubnetRoutesWithCollect(c, peerStatus, expectedPrefixes)
+    }
+}, 10*time.Second, 500*time.Millisecond, "client should see expected routes")
+
+// CORRECT: Variable scoping for sharing between blocks
+var routeNode *v1.Node
+var nodeKey key.NodePublic
+
+// First EventuallyWithT to get the node
+assert.EventuallyWithT(t, func(c *assert.CollectT) {
+    nodes, err := headscale.ListNodes()
+    assert.NoError(c, err)
+
+    for _, node := range nodes {
+        if node.GetName() == "router" {
+            routeNode = node
+            nodeKey, _ = key.ParseNodePublicUntyped(mem.S(node.GetNodeKey()))
+            break
+        }
+    }
+    assert.NotNil(c, routeNode, "should find router node")
+}, 10*time.Second, 100*time.Millisecond, "router node should exist")
+
+// Second EventuallyWithT using the nodeKey from first block
+assert.EventuallyWithT(t, func(c *assert.CollectT) {
+    status, err := client.Status()
+    assert.NoError(c, err)
+
+    peerStatus, ok := status.Peer[nodeKey]
+    assert.True(c, ok, "peer should exist in status")
+    requirePeerSubnetRoutesWithCollect(c, peerStatus, expectedPrefixes)
+}, 10*time.Second, 100*time.Millisecond, "routes should be visible to client")
+```
+
+### Incorrect Patterns to Avoid
+
+```go
+// INCORRECT: Multiple unrelated external calls in same EventuallyWithT
+assert.EventuallyWithT(t, func(c *assert.CollectT) {
+    // First external call
+    nodes, err := headscale.ListNodes()
+    assert.NoError(c, err)
+    assert.Len(c, nodes, 2)
+
+    // Second unrelated external call - WRONG!
+    status, err := client.Status()
+    assert.NoError(c, err)
+    assert.NotNil(c, status)
+}, 10*time.Second, 500*time.Millisecond, "mixed operations")
+
+// INCORRECT: Nested EventuallyWithT
+assert.EventuallyWithT(t, func(c *assert.CollectT) {
+    nodes, err := headscale.ListNodes()
+    assert.NoError(c, err)
+
+    // NEVER do this!
+    assert.EventuallyWithT(t, func(c2 *assert.CollectT) {
+        status, _ := client.Status()
+        assert.NotNil(c2, status)
+    }, 5*time.Second, 100*time.Millisecond, "nested")
+}, 10*time.Second, 500*time.Millisecond, "outer")
+
+// INCORRECT: Variable scoping error
+assert.EventuallyWithT(t, func(c *assert.CollectT) {
+    nodes, err := headscale.ListNodes() // This shadows outer 'nodes' variable
+    assert.NoError(c, err)
+}, 10*time.Second, 500*time.Millisecond, "get nodes")
+
+// This will fail - nodes is nil because := created a new variable inside the block
+require.Len(t, nodes, 2) // COMPILATION ERROR or nil pointer
+
+// INCORRECT: Not wrapping external calls
+nodes, err := headscale.ListNodes() // External call not wrapped!
+require.NoError(t, err)
+```
+
+### Helper Functions for EventuallyWithT
+
+When creating helper functions for use within EventuallyWithT:
+
+```go
+// Helper function that accepts CollectT
+func requireNodeRouteCountWithCollect(c *assert.CollectT, node *v1.Node, available, approved, primary int) {
+    assert.Len(c, node.GetAvailableRoutes(), available, "available routes for node %s", node.GetName())
+    assert.Len(c, node.GetApprovedRoutes(), approved, "approved routes for node %s", node.GetName())
+    assert.Len(c, node.GetPrimaryRoutes(), primary, "primary routes for node %s", node.GetName())
+}
+
+// Usage within EventuallyWithT
+assert.EventuallyWithT(t, func(c *assert.CollectT) {
+    nodes, err := headscale.ListNodes()
+    assert.NoError(c, err)
+    requireNodeRouteCountWithCollect(c, nodes[0], 2, 2, 2)
+}, 10*time.Second, 500*time.Millisecond, "route counts should match expected")
+```
+
+### Operations That Must NOT Be Wrapped
+
+**CRITICAL**: The following operations are **blocking/mutating operations** that change state and MUST NOT be wrapped in EventuallyWithT:
+- `tailscale set` commands (e.g., `--advertise-routes`, `--accept-routes`)
+- `headscale.ApproveRoute()` - Approves routes on server
+- `headscale.CreateUser()` - Creates users
+- `headscale.CreatePreAuthKey()` - Creates authentication keys
+- `headscale.RegisterNode()` - Registers new nodes
+- Any `client.Execute()` that modifies configuration
+- Any operation that creates, updates, or deletes resources
+
+These operations:
+1. Complete synchronously or fail immediately
+2. Should not be retried automatically
+3. Need explicit error handling with `require.NoError()`
+
+### Correct Pattern for Blocking Operations
+
+```go
+// CORRECT: Blocking operation NOT wrapped
+status := client.MustStatus()
+command := []string{"tailscale", "set", "--advertise-routes=" + expectedRoutes[string(status.Self.ID)]}
+_, _, err = client.Execute(command)
+require.NoErrorf(t, err, "failed to advertise route: %s", err)
+
+// Then wait for the result with EventuallyWithT
+assert.EventuallyWithT(t, func(c *assert.CollectT) {
+    nodes, err := headscale.ListNodes()
+    assert.NoError(c, err)
+    assert.Contains(c, nodes[0].GetAvailableRoutes(), expectedRoutes[string(status.Self.ID)])
+}, 10*time.Second, 100*time.Millisecond, "route should be advertised")
+
+// INCORRECT: Blocking operation wrapped (DON'T DO THIS)
+assert.EventuallyWithT(t, func(c *assert.CollectT) {
+    _, _, err = client.Execute([]string{"tailscale", "set", "--advertise-routes=10.0.0.0/24"})
+    assert.NoError(c, err) // This might retry the command multiple times!
+}, 10*time.Second, 100*time.Millisecond, "advertise routes")
+```
+
+### Assert vs Require Pattern
+
+When working within EventuallyWithT blocks where you need to prevent panics:
+
+```go
+assert.EventuallyWithT(t, func(c *assert.CollectT) {
+    nodes, err := headscale.ListNodes()
+    assert.NoError(c, err)
+
+    // For array bounds - use require with t to prevent panic
+    assert.Len(c, nodes, 6)  // Test expectation
+    require.GreaterOrEqual(t, len(nodes), 3, "need at least 3 nodes to avoid panic")
+
+    // For nil pointer access - use require with t before dereferencing
+    assert.NotNil(c, srs1PeerStatus.PrimaryRoutes)  // Test expectation
+    require.NotNil(t, srs1PeerStatus.PrimaryRoutes, "primary routes must be set to avoid panic")
+    assert.Contains(c,
+        srs1PeerStatus.PrimaryRoutes.AsSlice(),
+        pref,
+    )
+}, 5*time.Second, 200*time.Millisecond, "checking route state")
+```
+
+**Key Principle**:
+- Use `assert` with `c` (*assert.CollectT) for test expectations that can be retried
+- Use `require` with `t` (*testing.T) for MUST conditions that prevent panics
+- Within EventuallyWithT, both are available - choose based on whether failure would cause a panic
+
+### Common Scenarios
+
+1. **Waiting for route advertisement**:
+```go
+client.Execute([]string{"tailscale", "set", "--advertise-routes=10.0.0.0/24"})
+
+assert.EventuallyWithT(t, func(c *assert.CollectT) {
+    nodes, err := headscale.ListNodes()
+    assert.NoError(c, err)
+    assert.Contains(c, nodes[0].GetAvailableRoutes(), "10.0.0.0/24")
+}, 10*time.Second, 100*time.Millisecond, "route should be advertised")
+```
+
+2. **Checking client sees routes**:
+```go
+assert.EventuallyWithT(t, func(c *assert.CollectT) {
+    status, err := client.Status()
+    assert.NoError(c, err)
+
+    // Check all peers have expected routes
+    for _, peerKey := range status.Peers() {
+        peerStatus := status.Peer[peerKey]
+        assert.Contains(c, peerStatus.AllowedIPs, expectedPrefix)
+    }
+}, 10*time.Second, 100*time.Millisecond, "all peers should see route")
+```
+
+3. **Sequential operations**:
+```go
+// First wait for node to appear
+var nodeID uint64
+assert.EventuallyWithT(t, func(c *assert.CollectT) {
+    nodes, err := headscale.ListNodes()
+    assert.NoError(c, err)
+    assert.Len(c, nodes, 1)
+    nodeID = nodes[0].GetId()
+}, 10*time.Second, 100*time.Millisecond, "node should register")
+
+// Then perform operation
+_, err := headscale.ApproveRoute(nodeID, "10.0.0.0/24")
+require.NoError(t, err)
+
+// Then wait for result
+assert.EventuallyWithT(t, func(c *assert.CollectT) {
+    nodes, err := headscale.ListNodes()
+    assert.NoError(c, err)
+    assert.Contains(c, nodes[0].GetApprovedRoutes(), "10.0.0.0/24")
+}, 10*time.Second, 100*time.Millisecond, "route should be approved")
+```
+
+## Your Core Responsibilities
+
+1. **Test Execution Strategy**: Execute integration tests with appropriate configurations, understanding when to use `--postgres` and timing requirements for different test categories. Follow phase-based testing approach prioritizing route tests.
+   - **Why this priority**: Route tests are less infrastructure-sensitive and validate core security logic
+
+2. **Systematic Test Analysis**: When tests fail, systematically examine artifacts starting with Headscale server logs, then client logs, then protocol data. Focus on CODE ISSUES first (99% of cases), not infrastructure. Use real-world failure patterns to guide investigation.
+   - **Why this approach**: Most failures are logic bugs, not environment issues - efficient debugging saves time
+
+3. **Timing & Synchronization Expertise**: Understand asynchronous Headscale operations, particularly route advertisements, NodeStore synchronization at `poll.go:420`, and policy propagation. Fix timing with `EventuallyWithT` while preserving original test expectations.
+   - **Why preserve expectations**: Test assertions encode business requirements and security policies
+   - **Key Pattern**: Apply the EventuallyWithT pattern correctly for all external calls as documented above
+
+4. **Root Cause Analysis**: Distinguish between actual code regressions (route approval logic, HA failover architecture), timing issues requiring `EventuallyWithT` patterns, and genuine infrastructure problems (DNS, Docker, container issues).
+   - **Why this distinction matters**: Different problem types require completely different solution approaches
+   - **EventuallyWithT Issues**: Often manifest as flaky tests or immediate assertion failures after async operations
+
+5. **Security-Aware Quality Validation**: Ensure tests properly validate end-to-end functionality with realistic timing expectations and proper error handling. Never suggest security bypasses or test expectation changes. Add comprehensive godoc when you understand test business logic.
+   - **Why security focus**: Integration tests are the last line of defense against security regressions
+   - **EventuallyWithT Usage**: Proper use prevents race conditions without weakening security assertions
+
+6. **Concurrent Execution Awareness**: Respect run ID isolation and never interfere with other agents' test sessions. Each test run has a unique run ID - only clean up YOUR containers (by run ID label), never perform global cleanup while tests may be running.
+   - **Why this matters**: Multiple agents/users may run tests concurrently on the same Docker daemon
+   - **Key Rule**: NEVER use global container cleanup commands - the test runner handles cleanup automatically per run ID
+
+**CRITICAL PRINCIPLE**: Test expectations are sacred contracts that define correct system behavior. When tests fail, fix the code to match the test, never change the test to match broken code. Only timing and observability improvements are allowed - business logic expectations are immutable.
+
+**ISOLATION PRINCIPLE**: Each test run is isolated by its unique Run ID. Never interfere with other test sessions. The system handles cleanup automatically - manual global cleanup commands are forbidden when other tests may be running.
+
+**EventuallyWithT PRINCIPLE**: Every external call to headscale server or tailscale client must be wrapped in EventuallyWithT. Follow the five key rules strictly: one external call per block, proper variable scoping, no nesting, use CollectT for assertions, and provide descriptive messages.
+
+**Remember**: Test failures are usually code issues in Headscale that need to be fixed, not infrastructure problems to be ignored. Use the specific debugging workflows and failure patterns documented above to efficiently identify root causes. Infrastructure issues have very specific signatures - everything else is code-related.
--- a/.dockerignore
+++ b/.dockerignore
@@ -21,4 +21,3 @@ LICENSE
 node_modules/
 package-lock.json
 package.json
-
--- a/.editorconfig
+++ b/.editorconfig
@@ -0,0 +1,16 @@
+root = true
+
+[*]
+charset = utf-8
+end_of_line = lf
+indent_size = 2
+indent_style = space
+insert_final_newline = true
+trim_trailing_whitespace = true
+max_line_length = 120
+
+[*.go]
+indent_style = tab
+
+[Makefile]
+indent_style = tab
--- a/.github/ISSUE_TEMPLATE/bug_report.yaml
+++ b/.github/ISSUE_TEMPLATE/bug_report.yaml
@@ -6,8 +6,7 @@ body:
  - type: checkboxes
    attributes:
      label: Is this a support request?
-      description:
-        This issue tracker is for bugs and feature requests only. If you need
+      description: This issue tracker is for bugs and feature requests only. If you need
        help, please use ask in our Discord community
      options:
        - label: This is not a support request
@@ -15,8 +14,7 @@ body:
  - type: checkboxes
    attributes:
      label: Is there an existing issue for this?
-      description:
-        Please search to see if an issue already exists for the bug you
+      description: Please search to see if an issue already exists for the bug you
        encountered.
      options:
        - label: I have searched the existing issues
@@ -52,12 +50,15 @@ body:
        If you are using a container, always provide the headscale version and not only the Docker image version.
        Please do not put "latest".

+        Describe your "headscale network". Is there a lot of nodes, are the nodes all interconnected, are some subnet routers?
+
        If you are experiencing a problem during an upgrade, please provide the versions of the old and new versions of Headscale and Tailscale.

        examples:
          - **OS**: Ubuntu 24.04
          - **Headscale version**: 0.24.3
          - **Tailscale version**: 1.80.0
+          - **Number of nodes**: 20
      value: |
        - OS:
        - Headscale version:
@@ -77,6 +78,10 @@ body:
    attributes:
      label: Debug information
      description: |
+        Please have a look at our [Debugging and troubleshooting
+        guide](https://headscale.net/development/ref/debug/) to learn about
+        common debugging techniques.
+
        Links? References? Anything that will give us more context about the issue you are encountering.
        If **any** of these are omitted we will likely close your issue, do **not** ignore them.

@@ -92,7 +97,7 @@ body:
        `tailscale status --json > DESCRIPTIVE_NAME.json`

        Get the logs of a Tailscale client that is not working as expected.
-        `tailscale daemon-logs`
+        `tailscale debug daemon-logs`

        Tip: You can attach images or log files by clicking this area to highlight it and then dragging files in.
        **Ensure** you use formatting for files you attach.
--- a/.github/ISSUE_TEMPLATE/config.yml
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -3,9 +3,9 @@ blank_issues_enabled: false

 # Contact links
 contact_links:
-  - name: "headscale usage documentation"
-    url: "https://github.com/juanfont/headscale/blob/main/docs"
-    about: "Find documentation about how to configure and run headscale."
  - name: "headscale Discord community"
-    url: "https://discord.gg/xGj2TuqyxY"
+    url: "https://discord.gg/c84AZQhmpx"
    about: "Please ask and answer questions about usage of headscale here."
+  - name: "headscale usage documentation"
+    url: "https://headscale.net/"
+    about: "Find documentation about how to configure and run headscale."
--- a/.github/ISSUE_TEMPLATE/feature_request.yaml
+++ b/.github/ISSUE_TEMPLATE/feature_request.yaml
@@ -16,15 +16,13 @@ body:
  - type: textarea
    attributes:
      label: Description
-      description:
-        A clear and precise description of what new or changed feature you want.
+      description: A clear and precise description of what new or changed feature you want.
    validations:
      required: true
  - type: checkboxes
    attributes:
      label: Contribution
-      description:
-        Are you willing to contribute to the implementation of this feature?
+      description: Are you willing to contribute to the implementation of this feature?
      options:
        - label: I can write the design doc for this feature
          required: false
@@ -33,7 +31,6 @@ body:
  - type: textarea
    attributes:
      label: How can it be implemented?
-      description:
-        Free text for your ideas on how this feature could be implemented.
+      description: Free text for your ideas on how this feature could be implemented.
    validations:
      required: false
--- a/.github/label-response/needs-more-info.md
+++ b/.github/label-response/needs-more-info.md
@@ -0,0 +1,80 @@
+Thank you for taking the time to report this issue.
+
+To help us investigate and resolve this, we need more information. Please provide the following:
+
+> [!TIP]
+> Most issues turn out to be configuration errors rather than bugs. We encourage you to discuss your problem in our [Discord community](https://discord.gg/c84AZQhmpx) **before** opening an issue. The community can often help identify misconfigurations quickly, saving everyone time.
+
+## Required Information
+
+### Environment Details
+
+- **Headscale version**: (run `headscale version`)
+- **Tailscale client version**: (run `tailscale version`)
+- **Operating System**: (e.g., Ubuntu 24.04, macOS 14, Windows 11)
+- **Deployment method**: (binary, Docker, Kubernetes, etc.)
+- **Reverse proxy**: (if applicable: nginx, Traefik, Caddy, etc. - include configuration)
+
+### Debug Information
+
+Please follow our [Debugging and Troubleshooting Guide](https://headscale.net/stable/ref/debug/) and provide:
+
+1. **Client netmap dump** (from affected Tailscale client):
+
+   ```bash
+   tailscale debug netmap > netmap.json
+   ```
+
+2. **Client status dump** (from affected Tailscale client):
+
+   ```bash
+   tailscale status --json > status.json
+   ```
+
+3. **Tailscale client logs** (if experiencing client issues):
+
+   ```bash
+   tailscale debug daemon-logs
+   ```
+
+   > [!IMPORTANT]
+   > We need logs from **multiple nodes** to understand the full picture:
+   >
+   > - The node(s) initiating connections
+   > - The node(s) being connected to
+   >
+   > Without logs from both sides, we cannot diagnose connectivity issues.
+
+4. **Headscale server logs** with `log.level: trace` enabled
+
+5. **Headscale configuration** (with sensitive values redacted - see rules below)
+
+6. **ACL/Policy configuration** (if using ACLs)
+
+7. **Proxy/Docker configuration** (if applicable - nginx.conf, docker-compose.yml, Traefik config, etc.)
+
+## Formatting Requirements
+
+- **Attach long files** - Do not paste large logs or configurations inline. Use GitHub file attachments or GitHub Gists.
+- **Use proper Markdown** - Format code blocks, logs, and configurations with appropriate syntax highlighting.
+- **Structure your response** - Use the headings above to organize your information clearly.
+
+## Redaction Rules
+
+> [!CAUTION]
+> **Replace, do not remove.** Removing information makes debugging impossible.
+
+When redacting sensitive information:
+
+- ✅ **Replace consistently** - If you change `alice@company.com` to `user1@example.com`, use `user1@example.com` everywhere (logs, config, policy, etc.)
+- ✅ **Use meaningful placeholders** - `user1@example.com`, `bob@example.com`, `my-secret-key` are acceptable
+- ❌ **Never remove information** - Gaps in data prevent us from correlating events across logs
+- ❌ **Never redact IP addresses** - We need the actual IPs to trace network paths and identify issues
+
+**If redaction rules are not followed, we will be unable to debug the issue and will have to close it.**
+
+---
+
+**Note:** This issue will be automatically closed in 3 days if no additional information is provided. Once you reply with the requested information, the `needs-more-info` label will be removed automatically.
+
+If you need help gathering this information, please visit our [Discord community](https://discord.gg/c84AZQhmpx).
--- a/.github/label-response/support-request.md
+++ b/.github/label-response/support-request.md
@@ -0,0 +1,15 @@
+Thank you for reaching out.
+
+This issue tracker is used for **bug reports and feature requests** only. Your question appears to be a support or configuration question rather than a bug report.
+
+For help with setup, configuration, or general questions, please visit our [Discord community](https://discord.gg/c84AZQhmpx) where the community and maintainers can assist you in real-time.
+
+**Before posting in Discord, please check:**
+
+- [Documentation](https://headscale.net/)
+- [FAQ](https://headscale.net/stable/faq/)
+- [Debugging and Troubleshooting Guide](https://headscale.net/stable/ref/debug/)
+
+If after troubleshooting you determine this is actually a bug, please open a new issue with the required debug information from the troubleshooting guide.
+
+This issue has been automatically closed.
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -5,8 +5,6 @@ on:
    branches:
      - main
  pull_request:
-    branches:
-      - main

 concurrency:
  group: ${{ github.workflow }}-$${{ github.head_ref || github.run_id }}
@@ -17,7 +15,7 @@ jobs:
    runs-on: ubuntu-latest
    permissions: write-all
    steps:
-      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
        with:
          fetch-depth: 2
      - name: Get changed files
@@ -31,13 +29,12 @@ jobs:
              - '**/*.go'
              - 'integration_test/'
              - 'config-example.yaml'
-      - uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
+      - uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
        if: steps.changed-files.outputs.files == 'true'
      - uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
        if: steps.changed-files.outputs.files == 'true'
        with:
-          primary-key:
-            nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
+          primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
            '**/flake.lock') }}
          restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}

@@ -57,7 +54,7 @@ jobs:
          exit $BUILD_STATUS

      - name: Nix gosum diverging
-        uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7.0.1
+        uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
        if: failure() && steps.build.outcome == 'failure'
        with:
          github-token: ${{secrets.GITHUB_TOKEN}}
@@ -69,7 +66,7 @@ jobs:
              body: 'Nix build failed with wrong gosum, please update "vendorSha256" (${{ steps.build.outputs.OLD_HASH }}) for the "headscale" package in flake.nix with the new SHA: ${{ steps.build.outputs.NEW_HASH }}'
            })

-      - uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+      - uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
        if: steps.changed-files.outputs.files == 'true'
        with:
          name: headscale-linux
@@ -79,29 +76,25 @@ jobs:
    strategy:
      matrix:
        env:
-          - "GOARCH=arm   GOOS=linux GOARM=5"
-          - "GOARCH=arm   GOOS=linux GOARM=6"
-          - "GOARCH=arm   GOOS=linux GOARM=7"
          - "GOARCH=arm64 GOOS=linux"
-          - "GOARCH=386   GOOS=linux"
          - "GOARCH=amd64 GOOS=linux"
          - "GOARCH=arm64 GOOS=darwin"
          - "GOARCH=amd64 GOOS=darwin"
    steps:
-      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
-      - uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
+      - uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
      - uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
        with:
-          primary-key:
-            nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
+          primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
            '**/flake.lock') }}
          restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}

      - name: Run go cross compile
-        run:
-          env ${{ matrix.env }} nix develop --command -- go build -o "headscale"
+        env:
+          CGO_ENABLED: 0
+        run: env ${{ matrix.env }} nix develop --command -- go build -o "headscale"
          ./cmd/headscale
-      - uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+      - uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
        with:
          name: "headscale-${{ matrix.env }}"
          path: "headscale"
--- a/.github/workflows/check-generated.yml
+++ b/.github/workflows/check-generated.yml
@@ -0,0 +1,55 @@
+name: Check Generated Files
+
+on:
+  push:
+    branches:
+      - main
+  pull_request:
+    branches:
+      - main
+
+concurrency:
+  group: ${{ github.workflow }}-$${{ github.head_ref || github.run_id }}
+  cancel-in-progress: true
+
+jobs:
+  check-generated:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
+        with:
+          fetch-depth: 2
+      - name: Get changed files
+        id: changed-files
+        uses: dorny/paths-filter@de90cc6fb38fc0963ad72b210f1f284cd68cea36 # v3.0.2
+        with:
+          filters: |
+            files:
+              - '*.nix'
+              - 'go.*'
+              - '**/*.go'
+              - '**/*.proto'
+              - 'buf.gen.yaml'
+              - 'tools/**'
+      - uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
+        if: steps.changed-files.outputs.files == 'true'
+      - uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
+        if: steps.changed-files.outputs.files == 'true'
+        with:
+          primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix', '**/flake.lock') }}
+          restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}
+
+      - name: Run make generate
+        if: steps.changed-files.outputs.files == 'true'
+        run: nix develop --command -- make generate
+
+      - name: Check for uncommitted changes
+        if: steps.changed-files.outputs.files == 'true'
+        run: |
+          if ! git diff --exit-code; then
+            echo "❌ Generated files are not up to date!"
+            echo "Please run 'make generate' and commit the changes."
+            exit 1
+          else
+            echo "✅ All generated files are up to date."
+          fi
--- a/.github/workflows/check-tests.yaml
+++ b/.github/workflows/check-tests.yaml
@@ -10,7 +10,7 @@ jobs:
  check-tests:
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
        with:
          fetch-depth: 2
      - name: Get changed files
@@ -24,13 +24,12 @@ jobs:
              - '**/*.go'
              - 'integration_test/'
              - 'config-example.yaml'
-      - uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
+      - uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
        if: steps.changed-files.outputs.files == 'true'
      - uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
        if: steps.changed-files.outputs.files == 'true'
        with:
-          primary-key:
-            nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
+          primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
            '**/flake.lock') }}
          restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}

--- a/.github/workflows/container-main.yml
+++ b/.github/workflows/container-main.yml
@@ -0,0 +1,112 @@
+---
+name: Build (main)
+
+on:
+  push:
+    branches:
+      - main
+    paths:
+      - "*.nix"
+      - "go.*"
+      - "**/*.go"
+      - ".github/workflows/container-main.yml"
+  workflow_dispatch:
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.sha }}
+  cancel-in-progress: true
+
+jobs:
+  container:
+    if: github.repository == 'juanfont/headscale'
+    runs-on: ubuntu-latest
+    permissions:
+      packages: write
+      contents: read
+    steps:
+      - name: Checkout
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
+
+      - name: Login to DockerHub
+        uses: docker/login-action@5e57cd118135c172c3672efd75eb46360885c0ef # v3.6.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_TOKEN }}
+
+      - name: Login to GHCR
+        uses: docker/login-action@5e57cd118135c172c3672efd75eb46360885c0ef # v3.6.0
+        with:
+          registry: ghcr.io
+          username: ${{ github.repository_owner }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+
+      - uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
+      - uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
+        with:
+          primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
+            '**/flake.lock') }}
+          restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}
+
+      - name: Set commit timestamp
+        run: echo "SOURCE_DATE_EPOCH=$(git log -1 --format=%ct)" >> $GITHUB_ENV
+
+      - name: Build and push to GHCR
+        env:
+          KO_DOCKER_REPO: ghcr.io/juanfont/headscale
+          KO_DEFAULTBASEIMAGE: gcr.io/distroless/base-debian13
+          CGO_ENABLED: "0"
+        run: |
+          nix develop --command -- ko build \
+            --bare \
+            --platform=linux/amd64,linux/arm64 \
+            --tags=main-${GITHUB_SHA::7} \
+            ./cmd/headscale
+
+      - name: Push to Docker Hub
+        env:
+          KO_DOCKER_REPO: headscale/headscale
+          KO_DEFAULTBASEIMAGE: gcr.io/distroless/base-debian13
+          CGO_ENABLED: "0"
+        run: |
+          nix develop --command -- ko build \
+            --bare \
+            --platform=linux/amd64,linux/arm64 \
+            --tags=main-${GITHUB_SHA::7} \
+            ./cmd/headscale
+
+  binaries:
+    if: github.repository == 'juanfont/headscale'
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        include:
+          - goos: linux
+            goarch: amd64
+          - goos: linux
+            goarch: arm64
+          - goos: darwin
+            goarch: amd64
+          - goos: darwin
+            goarch: arm64
+    steps:
+      - name: Checkout
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
+
+      - uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
+      - uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
+        with:
+          primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
+            '**/flake.lock') }}
+          restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}
+
+      - name: Build binary
+        env:
+          CGO_ENABLED: "0"
+          GOOS: ${{ matrix.goos }}
+          GOARCH: ${{ matrix.goarch }}
+        run: nix develop --command -- go build -o headscale ./cmd/headscale
+
+      - uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
+        with:
+          name: headscale-${{ matrix.goos }}-${{ matrix.goarch }}
+          path: headscale
--- a/.github/workflows/docs-deploy.yml
+++ b/.github/workflows/docs-deploy.yml
@@ -21,15 +21,15 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
-        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
        with:
          fetch-depth: 0
      - name: Install python
-        uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
+        uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548 # v6.1.0
        with:
          python-version: 3.x
      - name: Setup cache
-        uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
+        uses: actions/cache@a7833574556fa59680c1b7cb190c1735db73ebf0 # v5.0.0
        with:
          key: ${{ github.ref }}
          path: .cache
--- a/.github/workflows/docs-test.yml
+++ b/.github/workflows/docs-test.yml
@@ -11,13 +11,13 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
-        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
      - name: Install python
-        uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
+        uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548 # v6.1.0
        with:
          python-version: 3.x
      - name: Setup cache
-        uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
+        uses: actions/cache@a7833574556fa59680c1b7cb190c1735db73ebf0 # v5.0.0
        with:
          key: ${{ github.ref }}
          path: .cache
--- a/.github/workflows/gh-action-integration-generator.go
+++ b/.github/workflows/gh-action-integration-generator.go
@@ -10,6 +10,55 @@ import (
 	"strings"
 )

+// testsToSplit defines tests that should be split into multiple CI jobs.
+// Key is the test function name, value is a list of subtest prefixes.
+// Each prefix becomes a separate CI job as "TestName/prefix".
+//
+// Example: TestAutoApproveMultiNetwork has subtests like:
+//   - TestAutoApproveMultiNetwork/authkey-tag-advertiseduringup-false-pol-database
+//   - TestAutoApproveMultiNetwork/webauth-user-advertiseduringup-true-pol-file
+//
+// Splitting by approver type (tag, user, group) creates 6 CI jobs with 4 tests each:
+//   - TestAutoApproveMultiNetwork/authkey-tag.* (4 tests)
+//   - TestAutoApproveMultiNetwork/authkey-user.* (4 tests)
+//   - TestAutoApproveMultiNetwork/authkey-group.* (4 tests)
+//   - TestAutoApproveMultiNetwork/webauth-tag.* (4 tests)
+//   - TestAutoApproveMultiNetwork/webauth-user.* (4 tests)
+//   - TestAutoApproveMultiNetwork/webauth-group.* (4 tests)
+//
+// This reduces load per CI job (4 tests instead of 12) to avoid infrastructure
+// flakiness when running many sequential Docker-based integration tests.
+var testsToSplit = map[string][]string{
+	"TestAutoApproveMultiNetwork": {
+		"authkey-tag",
+		"authkey-user",
+		"authkey-group",
+		"webauth-tag",
+		"webauth-user",
+		"webauth-group",
+	},
+}
+
+// expandTests takes a list of test names and expands any that need splitting
+// into multiple subtest patterns.
+func expandTests(tests []string) []string {
+	var expanded []string
+	for _, test := range tests {
+		if prefixes, ok := testsToSplit[test]; ok {
+			// This test should be split into multiple jobs.
+			// We append ".*" to each prefix because the CI runner wraps patterns
+			// with ^...$ anchors. Without ".*", a pattern like "authkey$" wouldn't
+			// match "authkey-tag-advertiseduringup-false-pol-database".
+			for _, prefix := range prefixes {
+				expanded = append(expanded, fmt.Sprintf("%s/%s.*", test, prefix))
+			}
+		} else {
+			expanded = append(expanded, test)
+		}
+	}
+	return expanded
+}
+
 func findTests() []string {
 	rgBin, err := exec.LookPath("rg")
 	if err != nil {
@@ -66,8 +115,11 @@ func updateYAML(tests []string, jobName string, testPath string) {
 func main() {
 	tests := findTests()

-	quotedTests := make([]string, len(tests))
-	for i, test := range tests {
+	// Expand tests that should be split into multiple jobs
+	expandedTests := expandTests(tests)
+
+	quotedTests := make([]string, len(expandedTests))
+	for i, test := range expandedTests {
 		quotedTests[i] = fmt.Sprintf("\"%s\"", test)
 	}

--- a/.github/workflows/gh-actions-updater.yaml
+++ b/.github/workflows/gh-actions-updater.yaml
@@ -11,13 +11,13 @@ jobs:
    runs-on: ubuntu-latest

    steps:
-      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
        with:
          # [Required] Access token with `workflow` scope.
          token: ${{ secrets.WORKFLOW_SECRET }}

      - name: Run GitHub Actions Version Updater
-        uses: saadmk11/github-actions-version-updater@64be81ba69383f81f2be476703ea6570c4c8686e # v0.8.1
+        uses: saadmk11/github-actions-version-updater@d8781caf11d11168579c8e5e94f62b068038f442 # v0.9.0
        with:
          # [Required] Access token with `workflow` scope.
          token: ${{ secrets.WORKFLOW_SECRET }}
--- a/.github/workflows/integration-test-template.yml
+++ b/.github/workflows/integration-test-template.yml
@@ -16,7 +16,7 @@ on:

 jobs:
  test:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-24.04-arm
    env:
      # Github does not allow us to access secrets in pull requests,
      # so this env var is used to check if we have the secret or not.
@@ -28,23 +28,12 @@ jobs:
      # that triggered the build.
      HAS_TAILSCALE_SECRET: ${{ secrets.TS_OAUTH_CLIENT_ID }}
    steps:
-      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
        with:
          fetch-depth: 2
-      - name: Get changed files
-        id: changed-files
-        uses: dorny/paths-filter@de90cc6fb38fc0963ad72b210f1f284cd68cea36 # v3.0.2
-        with:
-          filters: |
-            files:
-              - '*.nix'
-              - 'go.*'
-              - '**/*.go'
-              - 'integration_test/'
-              - 'config-example.yaml'
      - name: Tailscale
        if: ${{ env.HAS_TAILSCALE_SECRET }}
-        uses: tailscale/github-action@6986d2c82a91fbac2949fe01f5bab95cf21b5102 # v3.2.2
+        uses: tailscale/github-action@a392da0a182bba0e9613b6243ebd69529b1878aa # v4.1.0
        with:
          oauth-client-id: ${{ secrets.TS_OAUTH_CLIENT_ID }}
          oauth-secret: ${{ secrets.TS_OAUTH_SECRET }}
@@ -52,44 +41,90 @@ jobs:
      - name: Setup SSH server for Actor
        if: ${{ env.HAS_TAILSCALE_SECRET }}
        uses: alexellis/setup-sshd-actor@master
-      - uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
-        if: steps.changed-files.outputs.files == 'true'
-      - uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
-        if: steps.changed-files.outputs.files == 'true'
+      - name: Download headscale image
+        uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
        with:
-          primary-key:
-            nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
-            '**/flake.lock') }}
-          restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}
+          name: headscale-image
+          path: /tmp/artifacts
+      - name: Download tailscale HEAD image
+        uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
+        with:
+          name: tailscale-head-image
+          path: /tmp/artifacts
+      - name: Download hi binary
+        uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
+        with:
+          name: hi-binary
+          path: /tmp/artifacts
+      - name: Download Go cache
+        uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
+        with:
+          name: go-cache
+          path: /tmp/artifacts
+      - name: Download postgres image
+        if: ${{ inputs.postgres_flag == '--postgres=1' }}
+        uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
+        with:
+          name: postgres-image
+          path: /tmp/artifacts
+      - name: Pin Docker to v28 (avoid v29 breaking changes)
+        run: |
+          # Docker 29 breaks docker build via Go client libraries and
+          # docker load/save with certain tarball formats.
+          # Pin to Docker 28.x until our tooling is updated.
+          # https://github.com/actions/runner-images/issues/13474
+          sudo install -m 0755 -d /etc/apt/keyrings
+          curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
+            | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
+          echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
+            https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" \
+            | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
+          sudo apt-get update -qq
+          VERSION=$(apt-cache madison docker-ce | grep '28\.5' | head -1 | awk '{print $3}')
+          sudo apt-get install -y --allow-downgrades \
+            "docker-ce=${VERSION}" "docker-ce-cli=${VERSION}"
+          sudo systemctl restart docker
+          docker version
+      - name: Load Docker images, Go cache, and prepare binary
+        run: |
+          gunzip -c /tmp/artifacts/headscale-image.tar.gz | docker load
+          gunzip -c /tmp/artifacts/tailscale-head-image.tar.gz | docker load
+          if [ -f /tmp/artifacts/postgres-image.tar.gz ]; then
+            gunzip -c /tmp/artifacts/postgres-image.tar.gz | docker load
+          fi
+          chmod +x /tmp/artifacts/hi
+          docker images
+          # Extract Go cache to host directories for bind mounting
+          mkdir -p /tmp/go-cache
+          tar -xzf /tmp/artifacts/go-cache.tar.gz -C /tmp/go-cache
+          ls -la /tmp/go-cache/ /tmp/go-cache/.cache/
      - name: Run Integration Test
-        uses: Wandalen/wretry.action@e68c23e6309f2871ca8ae4763e7629b9c258e1ea # v3.8.0
-        if: steps.changed-files.outputs.files == 'true'
+        env:
+          HEADSCALE_INTEGRATION_HEADSCALE_IMAGE: headscale:${{ github.sha }}
+          HEADSCALE_INTEGRATION_TAILSCALE_IMAGE: tailscale-head:${{ github.sha }}
+          HEADSCALE_INTEGRATION_POSTGRES_IMAGE: ${{ inputs.postgres_flag == '--postgres=1' && format('postgres:{0}', github.sha) || '' }}
+          HEADSCALE_INTEGRATION_GO_CACHE: /tmp/go-cache/go
+          HEADSCALE_INTEGRATION_GO_BUILD_CACHE: /tmp/go-cache/.cache/go-build
+        run: /tmp/artifacts/hi run --stats --ts-memory-limit=300 --hs-memory-limit=1500 "^${{ inputs.test }}$" \
+          --timeout=120m \
+          ${{ inputs.postgres_flag }}
+      # Sanitize test name for artifact upload (replace invalid characters: " : < > | * ? \ / with -)
+      - name: Sanitize test name for artifacts
+        if: always()
+        id: sanitize
+        run: echo "name=${TEST_NAME//[\":<>|*?\\\/]/-}" >> $GITHUB_OUTPUT
+        env:
+          TEST_NAME: ${{ inputs.test }}
+      - uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
+        if: always()
        with:
-          # Our integration tests are started like a thundering herd, often
-          # hitting limits of the various external repositories we depend on
-          # like docker hub. This will retry jobs every 5 min, 10 times,
-          # hopefully letting us avoid manual intervention and restarting jobs.
-          # One could of course argue that we should invest in trying to avoid
-          # this, but currently it seems like a larger investment to be cleverer
-          # about this.
-          # Some of the jobs might still require manual restart as they are really
-          # slow and this will cause them to eventually be killed by Github actions.
-          attempt_delay: 300000 # 5 min
-          attempt_limit: 2
-          command: |
-            nix develop --command -- hi run "^${{ inputs.test }}$" \
-              --timeout=120m \
-              ${{ inputs.postgres_flag }}
-      - uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
-        if: always() && steps.changed-files.outputs.files == 'true'
-        with:
-          name: ${{ inputs.database_name }}-${{ inputs.test }}-logs
+          name: ${{ inputs.database_name }}-${{ steps.sanitize.outputs.name }}-logs
          path: "control_logs/*/*.log"
-      - uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
-        if: always() && steps.changed-files.outputs.files == 'true'
+      - uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
+        if: always()
        with:
-          name: ${{ inputs.database_name }}-${{ inputs.test }}-archives
-          path: "control_logs/*/*.tar"
+          name: ${{ inputs.database_name }}-${{ steps.sanitize.outputs.name }}-artifacts
+          path: control_logs/
      - name: Setup a blocking tmux session
        if: ${{ env.HAS_TAILSCALE_SECRET }}
        uses: alexellis/block-with-tmux-action@master
--- a/.github/workflows/lint.yml
+++ b/.github/workflows/lint.yml
@@ -10,7 +10,7 @@ jobs:
  golangci-lint:
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
        with:
          fetch-depth: 2
      - name: Get changed files
@@ -24,13 +24,12 @@ jobs:
              - '**/*.go'
              - 'integration_test/'
              - 'config-example.yaml'
-      - uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
+      - uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
        if: steps.changed-files.outputs.files == 'true'
      - uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
        if: steps.changed-files.outputs.files == 'true'
        with:
-          primary-key:
-            nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
+          primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
            '**/flake.lock') }}
          restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}

@@ -38,12 +37,15 @@ jobs:
        if: steps.changed-files.outputs.files == 'true'
        run: nix develop --command -- golangci-lint run
          --new-from-rev=${{github.event.pull_request.base.sha}}
-          --format=colored-line-number
+          --output.text.path=stdout
+          --output.text.print-linter-name
+          --output.text.print-issued-lines
+          --output.text.colors

  prettier-lint:
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
        with:
          fetch-depth: 2
      - name: Get changed files
@@ -62,13 +64,12 @@ jobs:
              - '**/*.css'
              - '**/*.scss'
              - '**/*.html'
-      - uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
+      - uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
        if: steps.changed-files.outputs.files == 'true'
      - uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
        if: steps.changed-files.outputs.files == 'true'
        with:
-          primary-key:
-            nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
+          primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
            '**/flake.lock') }}
          restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}

@@ -80,12 +81,11 @@ jobs:
  proto-lint:
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
-      - uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
+      - uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
      - uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
        with:
-          primary-key:
-            nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
+          primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
            '**/flake.lock') }}
          restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}

--- a/.github/workflows/needs-more-info-comment.yml
+++ b/.github/workflows/needs-more-info-comment.yml
@@ -0,0 +1,28 @@
+name: Needs More Info - Post Comment
+
+on:
+  issues:
+    types: [labeled]
+
+jobs:
+  post-comment:
+    if: >-
+      github.event.label.name == 'needs-more-info' &&
+      github.repository == 'juanfont/headscale'
+    runs-on: ubuntu-latest
+    permissions:
+      issues: write
+      contents: read
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+        with:
+          sparse-checkout: .github/label-response/needs-more-info.md
+          sparse-checkout-cone-mode: false
+
+      - name: Post instruction comment
+        run: gh issue comment "$NUMBER" --body-file .github/label-response/needs-more-info.md
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GH_REPO: ${{ github.repository }}
+          NUMBER: ${{ github.event.issue.number }}
--- a/.github/workflows/needs-more-info-timer.yml
+++ b/.github/workflows/needs-more-info-timer.yml
@@ -0,0 +1,98 @@
+name: Needs More Info - Timer
+
+on:
+  schedule:
+    - cron: "0 0 * * *" # Daily at midnight UTC
+  issue_comment:
+    types: [created]
+  workflow_dispatch:
+
+jobs:
+  # When a non-bot user comments on a needs-more-info issue, remove the label.
+  remove-label-on-response:
+    if: >-
+      github.repository == 'juanfont/headscale' &&
+      github.event_name == 'issue_comment' &&
+      github.event.comment.user.type != 'Bot' &&
+      contains(github.event.issue.labels.*.name, 'needs-more-info')
+    runs-on: ubuntu-latest
+    permissions:
+      issues: write
+    steps:
+      - name: Remove needs-more-info label
+        run: gh issue edit "$NUMBER" --remove-label needs-more-info
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GH_REPO: ${{ github.repository }}
+          NUMBER: ${{ github.event.issue.number }}
+
+  # On schedule, close issues that have had no human response for 3 days.
+  close-stale:
+    if: >-
+      github.repository == 'juanfont/headscale' &&
+      github.event_name != 'issue_comment'
+    runs-on: ubuntu-latest
+    permissions:
+      issues: write
+    steps:
+      - uses: hustcer/setup-nu@920172d92eb04671776f3ba69d605d3b09351c30 # v3.22
+        with:
+          version: "*"
+
+      - name: Close stale needs-more-info issues
+        shell: nu {0}
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GH_REPO: ${{ github.repository }}
+        run: |
+          let issues = (gh issue list
+            --repo $env.GH_REPO
+            --label "needs-more-info"
+            --state open
+            --json number
+            | from json)
+
+          for issue in $issues {
+            let number = $issue.number
+            print $"Checking issue #($number)"
+
+            # Find when needs-more-info was last added
+            let events = (gh api $"repos/($env.GH_REPO)/issues/($number)/events"
+              --paginate | from json | flatten)
+            let label_event = ($events
+              | where event == "labeled" and label.name == "needs-more-info"
+              | last)
+            let label_added_at = ($label_event.created_at | into datetime)
+
+            # Check for non-bot comments after the label was added
+            let comments = (gh api $"repos/($env.GH_REPO)/issues/($number)/comments"
+              --paginate | from json | flatten)
+            let human_responses = ($comments
+              | where user.type != "Bot"
+              | where { ($in.created_at | into datetime) > $label_added_at })
+
+            if ($human_responses | length) > 0 {
+              print $"  Human responded, removing label"
+              gh issue edit $number --repo $env.GH_REPO --remove-label needs-more-info
+              continue
+            }
+
+            # Check if 3 days have passed
+            let elapsed = (date now) - $label_added_at
+            if $elapsed < 3day {
+              print $"  Only ($elapsed | format duration day) elapsed, skipping"
+              continue
+            }
+
+            print $"  No response for ($elapsed | format duration day), closing"
+            let message = [
+              "This issue has been automatically closed because no additional information was provided within 3 days."
+              ""
+              "If you have the requested information, please open a new issue and include the debug information requested above."
+              ""
+              "Thank you for your understanding."
+            ] | str join "\n"
+            gh issue comment $number --repo $env.GH_REPO --body $message
+            gh issue close $number --repo $env.GH_REPO --reason "not planned"
+            gh issue edit $number --repo $env.GH_REPO --remove-label needs-more-info
+          }
--- a/.github/workflows/nix-module-test.yml
+++ b/.github/workflows/nix-module-test.yml
@@ -0,0 +1,55 @@
+name: NixOS Module Tests
+
+on:
+  push:
+    branches:
+      - main
+  pull_request:
+    branches:
+      - main
+
+concurrency:
+  group: ${{ github.workflow }}-$${{ github.head_ref || github.run_id }}
+  cancel-in-progress: true
+
+jobs:
+  nix-module-check:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+
+    steps:
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
+        with:
+          fetch-depth: 2
+
+      - name: Get changed files
+        id: changed-files
+        uses: dorny/paths-filter@de90cc6fb38fc0963ad72b210f1f284cd68cea36 # v3.0.2
+        with:
+          filters: |
+            nix:
+              - 'nix/**'
+              - 'flake.nix'
+              - 'flake.lock'
+            go:
+              - 'go.*'
+              - '**/*.go'
+              - 'cmd/**'
+              - 'hscontrol/**'
+
+      - uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
+        if: steps.changed-files.outputs.nix == 'true' || steps.changed-files.outputs.go == 'true'
+
+      - uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
+        if: steps.changed-files.outputs.nix == 'true' || steps.changed-files.outputs.go == 'true'
+        with:
+          primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
+            '**/flake.lock') }}
+          restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}
+
+      - name: Run NixOS module tests
+        if: steps.changed-files.outputs.nix == 'true' || steps.changed-files.outputs.go == 'true'
+        run: |
+          echo "Running NixOS module integration test..."
+          nix build .#checks.x86_64-linux.headscale -L
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@@ -13,28 +13,46 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
-        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
        with:
          fetch-depth: 0

+      - name: Pin Docker to v28 (avoid v29 breaking changes)
+        run: |
+          # Docker 29 breaks docker build via Go client libraries and
+          # docker load/save with certain tarball formats.
+          # Pin to Docker 28.x until our tooling is updated.
+          # https://github.com/actions/runner-images/issues/13474
+          sudo install -m 0755 -d /etc/apt/keyrings
+          curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
+            | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
+          echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
+            https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" \
+            | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
+          sudo apt-get update -qq
+          VERSION=$(apt-cache madison docker-ce | grep '28\.5' | head -1 | awk '{print $3}')
+          sudo apt-get install -y --allow-downgrades \
+            "docker-ce=${VERSION}" "docker-ce-cli=${VERSION}"
+          sudo systemctl restart docker
+          docker version
+
      - name: Login to DockerHub
-        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
+        uses: docker/login-action@5e57cd118135c172c3672efd75eb46360885c0ef # v3.6.0
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}

      - name: Login to GHCR
-        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
+        uses: docker/login-action@5e57cd118135c172c3672efd75eb46360885c0ef # v3.6.0
        with:
          registry: ghcr.io
          username: ${{ github.repository_owner }}
          password: ${{ secrets.GITHUB_TOKEN }}

-      - uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
+      - uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
      - uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
        with:
-          primary-key:
-            nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
+          primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
            '**/flake.lock') }}
          restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}

--- a/.github/workflows/stale.yml
+++ b/.github/workflows/stale.yml
@@ -12,18 +12,16 @@ jobs:
      issues: write
      pull-requests: write
    steps:
-      - uses: actions/stale@5bef64f19d7facfb25b37b414482c7164d639639 # v9.1.0
+      - uses: actions/stale@997185467fa4f803885201cee163a9f38240193d # v10.1.1
        with:
          days-before-issue-stale: 90
          days-before-issue-close: 7
          stale-issue-label: "stale"
-          stale-issue-message:
-            "This issue is stale because it has been open for 90 days with no
+          stale-issue-message: "This issue is stale because it has been open for 90 days with no
            activity."
-          close-issue-message:
-            "This issue was closed because it has been inactive for 14 days
+          close-issue-message: "This issue was closed because it has been inactive for 14 days
            since being marked as stale."
          days-before-pr-stale: -1
          days-before-pr-close: -1
-          exempt-issue-labels: "no-stale-bot"
+          exempt-issue-labels: "no-stale-bot,needs-more-info"
          repo-token: ${{ secrets.GITHUB_TOKEN }}
--- a/.github/workflows/support-request.yml
+++ b/.github/workflows/support-request.yml
@@ -0,0 +1,30 @@
+name: Support Request - Close Issue
+
+on:
+  issues:
+    types: [labeled]
+
+jobs:
+  close-support-request:
+    if: >-
+      github.event.label.name == 'support-request' &&
+      github.repository == 'juanfont/headscale'
+    runs-on: ubuntu-latest
+    permissions:
+      issues: write
+      contents: read
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+        with:
+          sparse-checkout: .github/label-response/support-request.md
+          sparse-checkout-cone-mode: false
+
+      - name: Post comment and close issue
+        run: |
+          gh issue comment "$NUMBER" --body-file .github/label-response/support-request.md
+          gh issue close "$NUMBER" --reason "not planned"
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GH_REPO: ${{ github.repository }}
+          NUMBER: ${{ github.event.issue.number }}
--- a/.github/workflows/test-integration.yaml
+++ b/.github/workflows/test-integration.yaml
@@ -7,7 +7,154 @@ concurrency:
  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
  cancel-in-progress: true
 jobs:
+  # build: Builds binaries and Docker images once, uploads as artifacts for reuse.
+  # build-postgres: Pulls postgres image separately to avoid Docker Hub rate limits.
+  # sqlite: Runs all integration tests with SQLite backend.
+  # postgres: Runs a subset of tests with PostgreSQL to verify database compatibility.
+  build:
+    runs-on: ubuntu-24.04-arm
+    outputs:
+      files-changed: ${{ steps.changed-files.outputs.files }}
+    steps:
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
+        with:
+          fetch-depth: 2
+      - name: Get changed files
+        id: changed-files
+        uses: dorny/paths-filter@de90cc6fb38fc0963ad72b210f1f284cd68cea36 # v3.0.2
+        with:
+          filters: |
+            files:
+              - '*.nix'
+              - 'go.*'
+              - '**/*.go'
+              - 'integration/**'
+              - 'config-example.yaml'
+              - '.github/workflows/test-integration.yaml'
+              - '.github/workflows/integration-test-template.yml'
+              - 'Dockerfile.*'
+      - uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
+        if: steps.changed-files.outputs.files == 'true'
+      - uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
+        if: steps.changed-files.outputs.files == 'true'
+        with:
+          primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix', '**/flake.lock') }}
+          restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}
+      - name: Build binaries and warm Go cache
+        if: steps.changed-files.outputs.files == 'true'
+        run: |
+          # Build all Go binaries in one nix shell to maximize cache reuse
+          nix develop --command -- bash -c '
+            go build -o hi ./cmd/hi
+            CGO_ENABLED=0 GOOS=linux go build -o headscale ./cmd/headscale
+            # Build integration test binary to warm the cache with all dependencies
+            go test -c ./integration -o /dev/null 2>/dev/null || true
+          '
+      - name: Upload hi binary
+        if: steps.changed-files.outputs.files == 'true'
+        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
+        with:
+          name: hi-binary
+          path: hi
+          retention-days: 10
+      - name: Package Go cache
+        if: steps.changed-files.outputs.files == 'true'
+        run: |
+          # Package Go module cache and build cache
+          tar -czf go-cache.tar.gz -C ~ go .cache/go-build
+      - name: Upload Go cache
+        if: steps.changed-files.outputs.files == 'true'
+        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
+        with:
+          name: go-cache
+          path: go-cache.tar.gz
+          retention-days: 10
+      - name: Pin Docker to v28 (avoid v29 breaking changes)
+        if: steps.changed-files.outputs.files == 'true'
+        run: |
+          # Docker 29 breaks docker build via Go client libraries and
+          # docker load/save with certain tarball formats.
+          # Pin to Docker 28.x until our tooling is updated.
+          # https://github.com/actions/runner-images/issues/13474
+          sudo install -m 0755 -d /etc/apt/keyrings
+          curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
+            | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
+          echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
+            https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" \
+            | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
+          sudo apt-get update -qq
+          VERSION=$(apt-cache madison docker-ce | grep '28\.5' | head -1 | awk '{print $3}')
+          sudo apt-get install -y --allow-downgrades \
+            "docker-ce=${VERSION}" "docker-ce-cli=${VERSION}"
+          sudo systemctl restart docker
+          docker version
+      - name: Build headscale image
+        if: steps.changed-files.outputs.files == 'true'
+        run: |
+          docker build \
+            --file Dockerfile.integration-ci \
+            --tag headscale:${{ github.sha }} \
+            .
+          docker save headscale:${{ github.sha }} | gzip > headscale-image.tar.gz
+      - name: Build tailscale HEAD image
+        if: steps.changed-files.outputs.files == 'true'
+        run: |
+          docker build \
+            --file Dockerfile.tailscale-HEAD \
+            --tag tailscale-head:${{ github.sha }} \
+            .
+          docker save tailscale-head:${{ github.sha }} | gzip > tailscale-head-image.tar.gz
+      - name: Upload headscale image
+        if: steps.changed-files.outputs.files == 'true'
+        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
+        with:
+          name: headscale-image
+          path: headscale-image.tar.gz
+          retention-days: 10
+      - name: Upload tailscale HEAD image
+        if: steps.changed-files.outputs.files == 'true'
+        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
+        with:
+          name: tailscale-head-image
+          path: tailscale-head-image.tar.gz
+          retention-days: 10
+  build-postgres:
+    runs-on: ubuntu-24.04-arm
+    needs: build
+    if: needs.build.outputs.files-changed == 'true'
+    steps:
+      - name: Pin Docker to v28 (avoid v29 breaking changes)
+        run: |
+          # Docker 29 breaks docker build via Go client libraries and
+          # docker load/save with certain tarball formats.
+          # Pin to Docker 28.x until our tooling is updated.
+          # https://github.com/actions/runner-images/issues/13474
+          sudo install -m 0755 -d /etc/apt/keyrings
+          curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
+            | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
+          echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
+            https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" \
+            | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
+          sudo apt-get update -qq
+          VERSION=$(apt-cache madison docker-ce | grep '28\.5' | head -1 | awk '{print $3}')
+          sudo apt-get install -y --allow-downgrades \
+            "docker-ce=${VERSION}" "docker-ce-cli=${VERSION}"
+          sudo systemctl restart docker
+          docker version
+      - name: Pull and save postgres image
+        run: |
+          docker pull postgres:latest
+          docker tag postgres:latest postgres:${{ github.sha }}
+          docker save postgres:${{ github.sha }} | gzip > postgres-image.tar.gz
+      - name: Upload postgres image
+        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
+        with:
+          name: postgres-image
+          path: postgres-image.tar.gz
+          retention-days: 10
  sqlite:
+    needs: build
+    if: needs.build.outputs.files-changed == 'true'
    strategy:
      fail-fast: false
      matrix:
@@ -23,28 +170,48 @@ jobs:
          - TestPolicyUpdateWhileRunningWithCLIInDatabase
          - TestACLAutogroupMember
          - TestACLAutogroupTagged
+          - TestACLAutogroupSelf
+          - TestACLPolicyPropagationOverTime
+          - TestACLTagPropagation
+          - TestACLTagPropagationPortSpecific
+          - TestACLGroupWithUnknownUser
+          - TestACLGroupAfterUserDeletion
+          - TestACLGroupDeletionExactReproduction
+          - TestACLDynamicUnknownUserAddition
+          - TestACLDynamicUnknownUserRemoval
+          - TestAPIAuthenticationBypass
+          - TestAPIAuthenticationBypassCurl
+          - TestGRPCAuthenticationBypass
+          - TestCLIWithConfigAuthenticationBypass
          - TestAuthKeyLogoutAndReloginSameUser
          - TestAuthKeyLogoutAndReloginNewUser
          - TestAuthKeyLogoutAndReloginSameUserExpiredKey
+          - TestAuthKeyDeleteKey
+          - TestAuthKeyLogoutAndReloginRoutesPreserved
          - TestOIDCAuthenticationPingAll
          - TestOIDCExpireNodesBasedOnTokenExpiry
          - TestOIDC024UserCreation
          - TestOIDCAuthenticationWithPKCE
          - TestOIDCReloginSameNodeNewUser
+          - TestOIDCFollowUpUrl
+          - TestOIDCMultipleOpenedLoginUrls
+          - TestOIDCReloginSameNodeSameUser
+          - TestOIDCExpiryAfterRestart
+          - TestOIDCACLPolicyOnJoin
+          - TestOIDCReloginSameUserRoutesPreserved
          - TestAuthWebFlowAuthenticationPingAll
-          - TestAuthWebFlowLogoutAndRelogin
+          - TestAuthWebFlowLogoutAndReloginSameUser
+          - TestAuthWebFlowLogoutAndReloginNewUser
          - TestUserCommand
          - TestPreAuthKeyCommand
          - TestPreAuthKeyCommandWithoutExpiry
          - TestPreAuthKeyCommandReusableEphemeral
          - TestPreAuthKeyCorrectUserLoggedInCommand
+          - TestTaggedNodesCLIOutput
          - TestApiKeyCommand
-          - TestNodeTagCommand
-          - TestNodeAdvertiseTagCommand
          - TestNodeCommand
          - TestNodeExpireCommand
          - TestNodeRenameCommand
-          - TestNodeMoveCommand
          - TestPolicyCommand
          - TestPolicyBrokenConfigCommand
          - TestDERPVerifyEndpoint
@@ -61,17 +228,27 @@ jobs:
          - TestTaildrop
          - TestUpdateHostnameFromClient
          - TestExpireNode
+          - TestSetNodeExpiryInFuture
+          - TestDisableNodeExpiry
          - TestNodeOnlineStatus
          - TestPingAllByIPManyUpDown
          - Test2118DeletingOnlineNodePanics
+          - TestGrantCapRelay
+          - TestGrantCapDrive
          - TestEnablingRoutes
          - TestHASubnetRouterFailover
          - TestSubnetRouteACL
          - TestEnablingExitRoutes
          - TestSubnetRouterMultiNetwork
          - TestSubnetRouterMultiNetworkExitNode
-          - TestAutoApproveMultiNetwork
+          - TestAutoApproveMultiNetwork/authkey-tag.*
+          - TestAutoApproveMultiNetwork/authkey-user.*
+          - TestAutoApproveMultiNetwork/authkey-group.*
+          - TestAutoApproveMultiNetwork/webauth-tag.*
+          - TestAutoApproveMultiNetwork/webauth-user.*
+          - TestAutoApproveMultiNetwork/webauth-group.*
          - TestSubnetRouteACLFiltering
+          - TestGrantViaSubnetSteering
          - TestHeadscale
          - TestTailscaleNodesJoiningHeadcale
          - TestSSHOneUserToAll
@@ -79,12 +256,55 @@ jobs:
          - TestSSHNoSSHConfigured
          - TestSSHIsBlockedInACL
          - TestSSHUserOnlyIsolation
+          - TestSSHAutogroupSelf
+          - TestSSHOneUserToOneCheckModeCLI
+          - TestSSHOneUserToOneCheckModeOIDC
+          - TestSSHCheckModeUnapprovedTimeout
+          - TestSSHCheckModeCheckPeriodCLI
+          - TestSSHCheckModeAutoApprove
+          - TestSSHCheckModeNegativeCLI
+          - TestSSHLocalpart
+          - TestTagsAuthKeyWithTagRequestDifferentTag
+          - TestTagsAuthKeyWithTagNoAdvertiseFlag
+          - TestTagsAuthKeyWithTagCannotAddViaCLI
+          - TestTagsAuthKeyWithTagCannotChangeViaCLI
+          - TestTagsAuthKeyWithTagAdminOverrideReauthPreserves
+          - TestTagsAuthKeyWithTagCLICannotModifyAdminTags
+          - TestTagsAuthKeyWithoutTagCannotRequestTags
+          - TestTagsAuthKeyWithoutTagRegisterNoTags
+          - TestTagsAuthKeyWithoutTagCannotAddViaCLI
+          - TestTagsAuthKeyWithoutTagCLINoOpAfterAdminWithReset
+          - TestTagsAuthKeyWithoutTagCLINoOpAfterAdminWithEmptyAdvertise
+          - TestTagsAuthKeyWithoutTagCLICannotReduceAdminMultiTag
+          - TestTagsUserLoginOwnedTagAtRegistration
+          - TestTagsUserLoginNonExistentTagAtRegistration
+          - TestTagsUserLoginUnownedTagAtRegistration
+          - TestTagsUserLoginAddTagViaCLIReauth
+          - TestTagsUserLoginRemoveTagViaCLIReauth
+          - TestTagsUserLoginCLINoOpAfterAdminAssignment
+          - TestTagsUserLoginCLICannotRemoveAdminTags
+          - TestTagsAuthKeyWithTagRequestNonExistentTag
+          - TestTagsAuthKeyWithTagRequestUnownedTag
+          - TestTagsAuthKeyWithoutTagRequestNonExistentTag
+          - TestTagsAuthKeyWithoutTagRequestUnownedTag
+          - TestTagsAdminAPICannotSetNonExistentTag
+          - TestTagsAdminAPICanSetUnownedTag
+          - TestTagsAdminAPICannotRemoveAllTags
+          - TestTagsIssue2978ReproTagReplacement
+          - TestTagsAdminAPICannotSetInvalidFormat
+          - TestTagsUserLoginReauthWithEmptyTagsRemovesAllTags
+          - TestTagsAuthKeyWithoutUserInheritsTags
+          - TestTagsAuthKeyWithoutUserRejectsAdvertisedTags
+          - TestTagsAuthKeyConvertToUserViaCLIRegister
    uses: ./.github/workflows/integration-test-template.yml
+    secrets: inherit
    with:
      test: ${{ matrix.test }}
      postgres_flag: "--postgres=0"
      database_name: "sqlite"
  postgres:
+    needs: [build, build-postgres]
+    if: needs.build.outputs.files-changed == 'true'
    strategy:
      fail-fast: false
      matrix:
@@ -95,6 +315,7 @@ jobs:
          - TestPingAllByIPManyUpDown
          - TestSubnetRouterMultiNetwork
    uses: ./.github/workflows/integration-test-template.yml
+    secrets: inherit
    with:
      test: ${{ matrix.test }}
      postgres_flag: "--postgres=1"
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -11,7 +11,7 @@ jobs:
    runs-on: ubuntu-latest

    steps:
-      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
+      - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
        with:
          fetch-depth: 2

@@ -27,13 +27,12 @@ jobs:
              - 'integration_test/'
              - 'config-example.yaml'

-      - uses: nixbuild/nix-quick-install-action@889f3180bb5f064ee9e3201428d04ae9e41d54ad # v31
+      - uses: nixbuild/nix-quick-install-action@2c9db80fb984ceb1bcaa77cdda3fdf8cfba92035 # v34
        if: steps.changed-files.outputs.files == 'true'
      - uses: nix-community/cache-nix-action@135667ec418502fa5a3598af6fb9eb733888ce6a # v6.1.3
        if: steps.changed-files.outputs.files == 'true'
        with:
-          primary-key:
-            nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
+          primary-key: nix-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('**/*.nix',
            '**/flake.lock') }}
          restore-prefixes-first-match: nix-${{ runner.os }}-${{ runner.arch }}

--- a/.gitignore
+++ b/.gitignore
@@ -1,6 +1,10 @@
 ignored/
 tailscale/
 .vscode/
+.claude/
+logs/
+
+*.prof

 # Binaries for programs and plugins
 *.exe
@@ -25,6 +29,7 @@ config*.yaml
 !config-example.yaml
 derp.yaml
 *.hujson
+!hscontrol/policy/v2/testdata/*/*.hujson
 *.key
 /db.sqlite
 *.sqlite3
@@ -47,8 +52,6 @@ integration_test/etc/config.dump.yaml

 __debug_bin

-
 node_modules/
 package-lock.json
 package.json
-
--- a/.golangci.yaml
+++ b/.golangci.yaml
@@ -7,6 +7,7 @@ linters:
    - depguard
    - dupl
    - exhaustruct
+    - funcorder
    - funlen
    - gochecknoglobals
    - gochecknoinits
@@ -17,6 +18,7 @@ linters:
    - lll
    - maintidx
    - makezero
+    - mnd
    - musttag
    - nestif
    - nolintlint
@@ -28,6 +30,32 @@ linters:
    - wrapcheck
    - wsl
  settings:
+    forbidigo:
+      forbid:
+        # Forbid time.Sleep everywhere with context-appropriate alternatives
+        - pattern: 'time\.Sleep'
+          msg: >-
+            time.Sleep is forbidden.
+            In tests: use assert.EventuallyWithT for polling/waiting patterns.
+            In production code: use a backoff strategy (e.g., cenkalti/backoff) or proper synchronization primitives.
+        # Forbid inline string literals in zerolog field methods - use zf.* constants
+        - pattern: '\.(Str|Int|Int8|Int16|Int32|Int64|Uint|Uint8|Uint16|Uint32|Uint64|Float32|Float64|Bool|Dur|Time|TimeDiff|Strs|Ints|Uints|Floats|Bools|Any|Interface)\("[^"]+"'
+          msg: >-
+            Use zf.* constants for zerolog field names instead of string literals.
+            Import "github.com/juanfont/headscale/hscontrol/util/zlog/zf" and use
+            constants like zf.NodeID, zf.UserName, etc. Add new constants to
+            hscontrol/util/zlog/zf/fields.go if needed.
+        # Forbid ptr.To - use Go 1.26 new(expr) instead
+        - pattern: 'ptr\.To\('
+          msg: >-
+            ptr.To is forbidden. Use Go 1.26's new(expr) syntax instead.
+            Example: ptr.To(value) → new(value)
+        # Forbid tsaddr.SortPrefixes - use slices.SortFunc with netip.Prefix.Compare
+        - pattern: 'tsaddr\.SortPrefixes'
+          msg: >-
+            tsaddr.SortPrefixes is forbidden. Use Go 1.26's netip.Prefix.Compare instead.
+            Example: slices.SortFunc(prefixes, netip.Prefix.Compare)
+      analyze-types: true
    gocritic:
      disabled-checks:
        - appendAssign
--- a/.goreleaser.yml
+++ b/.goreleaser.yml
@@ -2,12 +2,16 @@
 version: 2
 before:
  hooks:
-    - go mod tidy -compat=1.24
+    - go mod tidy -compat=1.26
    - go mod vendor

 release:
  prerelease: auto
  draft: true
+  header: |
+    ## Upgrade
+
+    Please follow the steps outlined in the [upgrade guide](https://headscale.net/stable/setup/upgrade/) to update your existing Headscale installation.

 builds:
  - id: headscale
@@ -19,20 +23,10 @@ builds:
      - darwin_amd64
      - darwin_arm64
      - freebsd_amd64
-      - linux_386
      - linux_amd64
      - linux_arm64
-      - linux_arm_5
-      - linux_arm_6
-      - linux_arm_7
    flags:
      - -mod=readonly
-    ldflags:
-      - -s -w
-      - -X github.com/juanfont/headscale/hscontrol/types.Version={{ .Version }}
-      - -X github.com/juanfont/headscale/hscontrol/types.GitCommitHash={{ .Commit }}
-    tags:
-      - ts2019

 archives:
  - id: golang-cross
@@ -106,16 +100,14 @@ kos:
    # bare tells KO to only use the repository
    # for tagging and naming the container.
    bare: true
-    base_image: gcr.io/distroless/base-debian12
+    base_image: gcr.io/distroless/base-debian13
    build: headscale
    main: ./cmd/headscale
    env:
      - CGO_ENABLED=0
    platforms:
      - linux/amd64
-      - linux/386
      - linux/arm64
-      - linux/arm/v7
    tags:
      - "{{ if not .Prerelease }}latest{{ end }}"
      - "{{ if not .Prerelease }}{{ .Major }}.{{ .Minor }}.{{ .Patch }}{{ end }}"
@@ -128,6 +120,8 @@ kos:
      - "{{ .Tag }}"
      - '{{ trimprefix .Tag "v" }}'
      - "sha-{{ .ShortCommit }}"
+    creation_time: "{{.CommitTimestamp}}"
+    ko_data_creation_time: "{{.CommitTimestamp}}"

  - id: ghcr-debug
    repositories:
@@ -135,16 +129,14 @@ kos:
      - headscale/headscale

    bare: true
-    base_image: gcr.io/distroless/base-debian12:debug
+    base_image: gcr.io/distroless/base-debian13:debug
    build: headscale
    main: ./cmd/headscale
    env:
      - CGO_ENABLED=0
    platforms:
      - linux/amd64
-      - linux/386
      - linux/arm64
-      - linux/arm/v7
    tags:
      - "{{ if not .Prerelease }}latest-debug{{ end }}"
      - "{{ if not .Prerelease }}{{ .Major }}.{{ .Minor }}.{{ .Patch }}-debug{{ end }}"
--- a/.mcp.json
+++ b/.mcp.json
@@ -0,0 +1,34 @@
+{
+  "mcpServers": {
+    "claude-code-mcp": {
+      "type": "stdio",
+      "command": "npx",
+      "args": ["-y", "@steipete/claude-code-mcp@latest"],
+      "env": {}
+    },
+    "sequential-thinking": {
+      "type": "stdio",
+      "command": "npx",
+      "args": ["-y", "@modelcontextprotocol/server-sequential-thinking"],
+      "env": {}
+    },
+    "nixos": {
+      "type": "stdio",
+      "command": "uvx",
+      "args": ["mcp-nixos"],
+      "env": {}
+    },
+    "context7": {
+      "type": "stdio",
+      "command": "npx",
+      "args": ["-y", "@upstash/context7-mcp"],
+      "env": {}
+    },
+    "git": {
+      "type": "stdio",
+      "command": "npx",
+      "args": ["-y", "@cyanheads/git-mcp-server"],
+      "env": {}
+    }
+  }
+}
--- a/.mdformat.toml
+++ b/.mdformat.toml
@@ -0,0 +1,2 @@
+[plugin.mkdocs]
+align_semantic_breaks_in_lists = true
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -0,0 +1,62 @@
+# prek/pre-commit configuration for headscale
+# See: https://prek.j178.dev/quickstart/
+# See: https://prek.j178.dev/builtin/
+
+# Global exclusions - ignore generated code
+exclude: ^gen/
+
+repos:
+  # Built-in hooks from pre-commit/pre-commit-hooks
+  # prek will use fast-path optimized versions automatically
+  # See: https://prek.j178.dev/builtin/
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v6.0.0
+    hooks:
+      - id: check-added-large-files
+      - id: check-case-conflict
+      - id: check-executables-have-shebangs
+      - id: check-json
+      - id: check-merge-conflict
+      - id: check-symlinks
+      - id: check-toml
+      - id: check-xml
+      - id: check-yaml
+      - id: detect-private-key
+      - id: end-of-file-fixer
+      - id: fix-byte-order-marker
+      - id: mixed-line-ending
+      - id: trailing-whitespace
+
+  # Local hooks for project-specific tooling
+  - repo: local
+    hooks:
+      # nixpkgs-fmt for Nix files
+      - id: nixpkgs-fmt
+        name: nixpkgs-fmt
+        entry: nixpkgs-fmt
+        language: system
+        files: \.nix$
+
+      # Prettier for formatting
+      - id: prettier
+        name: prettier
+        entry: prettier --write --list-different
+        language: system
+        exclude: ^docs/
+        types_or: [javascript, jsx, ts, tsx, yaml, json, toml, html, css, scss, sass, markdown]
+
+      # mdformat for docs
+      - id: mdformat
+        name: mdformat
+        entry: mdformat
+        language: system
+        types_or: [markdown]
+        files: ^docs/
+
+      # golangci-lint for Go code quality
+      - id: golangci-lint
+        name: golangci-lint
+        entry: nix develop --command -- golangci-lint run --new-from-rev=HEAD~1 --timeout=5m --fix
+        language: system
+        types: [go]
+        pass_filenames: false
--- a/.prettierignore
+++ b/.prettierignore
@@ -1,5 +1,2 @@
 .github/workflows/test-integration-v2*
-docs/about/features.md
-docs/ref/configuration.md
-docs/ref/oidc.md
-docs/ref/remote-cli.md
+docs/
--- a/AGENTS.md
+++ b/AGENTS.md
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,395 +1 @@
-# CLAUDE.md
-
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
-
-## Overview
-
-Headscale is an open-source implementation of the Tailscale control server written in Go. It provides self-hosted coordination for Tailscale networks (tailnets), managing node registration, IP allocation, policy enforcement, and DERP routing.
-
-## Development Commands
-
-### Quick Setup
-```bash
-# Recommended: Use Nix for dependency management
-nix develop
-
-# Full development workflow
-make dev  # runs fmt + lint + test + build
-```
-
-### Essential Commands
-```bash
-# Build headscale binary
-make build
-
-# Run tests
-make test
-go test ./...                    # All unit tests
-go test -race ./...              # With race detection
-
-# Run specific integration test
-go run ./cmd/hi run "TestName" --postgres
-
-# Code formatting and linting
-make fmt         # Format all code (Go, docs, proto)
-make lint        # Lint all code (Go, proto)
-make fmt-go      # Format Go code only
-make lint-go     # Lint Go code only
-
-# Protocol buffer generation (after modifying proto/)
-make generate
-
-# Clean build artifacts  
-make clean
-```
-
-### Integration Testing
-```bash
-# Use the hi (Headscale Integration) test runner
-go run ./cmd/hi doctor                    # Check system requirements
-go run ./cmd/hi run "TestPattern"         # Run specific test
-go run ./cmd/hi run "TestPattern" --postgres  # With PostgreSQL backend
-
-# Test artifacts are saved to control_logs/ with logs and debug data
-```
-
-## Project Structure & Architecture
-
-### Top-Level Organization
-
-```
-headscale/
-├── cmd/                    # Command-line applications
-│   ├── headscale/         # Main headscale server binary
-│   └── hi/               # Headscale Integration test runner
-├── hscontrol/            # Core control plane logic
-├── integration/          # End-to-end Docker-based tests
-├── proto/               # Protocol buffer definitions
-├── gen/                 # Generated code (protobuf)
-├── docs/                # Documentation
-└── packaging/           # Distribution packaging
-```
-
-### Core Packages (`hscontrol/`)
-
-**Main Server (`hscontrol/`)**
- `app.go`: Application setup, dependency injection, server lifecycle
- `handlers.go`: HTTP/gRPC API endpoints for management operations
- `grpcv1.go`: gRPC service implementation for headscale API
- `poll.go`: **Critical** - Handles Tailscale MapRequest/MapResponse protocol
- `noise.go`: Noise protocol implementation for secure client communication
- `auth.go`: Authentication flows (web, OIDC, command-line)
- `oidc.go`: OpenID Connect integration for user authentication
-
-**State Management (`hscontrol/state/`)**
- `state.go`: Central coordinator for all subsystems (database, policy, IP allocation, DERP)
- `node_store.go`: **Performance-critical** - In-memory cache with copy-on-write semantics
- Thread-safe operations with deadlock detection
- Coordinates between database persistence and real-time operations
-
-**Database Layer (`hscontrol/db/`)**
- `db.go`: Database abstraction, GORM setup, migration management
- `node.go`: Node lifecycle, registration, expiration, IP assignment
- `users.go`: User management, namespace isolation
- `api_key.go`: API authentication tokens
- `preauth_keys.go`: Pre-authentication keys for automated node registration
- `ip.go`: IP address allocation and management
- `policy.go`: Policy storage and retrieval
- Schema migrations in `schema.sql` with extensive test data coverage
-
-**Policy Engine (`hscontrol/policy/`)**
- `policy.go`: Core ACL evaluation logic, HuJSON parsing
- `v2/`: Next-generation policy system with improved filtering
- `matcher/`: ACL rule matching and evaluation engine
- Determines peer visibility, route approval, and network access rules
- Supports both file-based and database-stored policies
-
-**Network Management (`hscontrol/`)**
- `derp/`: DERP (Designated Encrypted Relay for Packets) server implementation
-  - NAT traversal when direct connections fail
-  - Fallback relay for firewall-restricted environments
- `mapper/`: Converts internal Headscale state to Tailscale's wire protocol format
-  - `tail.go`: Tailscale-specific data structure generation
- `routes/`: Subnet route management and primary route selection
- `dns/`: DNS record management and MagicDNS implementation
-
-**Utilities & Support (`hscontrol/`)**
- `types/`: Core data structures, configuration, validation
- `util/`: Helper functions for networking, DNS, key management
- `templates/`: Client configuration templates (Apple, Windows, etc.)
- `notifier/`: Event notification system for real-time updates
- `metrics.go`: Prometheus metrics collection
- `capver/`: Tailscale capability version management
-
-### Key Subsystem Interactions
-
-**Node Registration Flow**
-1. **Client Connection**: `noise.go` handles secure protocol handshake
-2. **Authentication**: `auth.go` validates credentials (web/OIDC/preauth)
-3. **State Creation**: `state.go` coordinates IP allocation via `db/ip.go`
-4. **Storage**: `db/node.go` persists node, `NodeStore` caches in memory
-5. **Network Setup**: `mapper/` generates initial Tailscale network map
-
-**Ongoing Operations**
-1. **Poll Requests**: `poll.go` receives periodic client updates
-2. **State Updates**: `NodeStore` maintains real-time node information
-3. **Policy Application**: `policy/` evaluates ACL rules for peer relationships
-4. **Map Distribution**: `mapper/` sends network topology to all affected clients
-
-**Route Management**
-1. **Advertisement**: Clients announce routes via `poll.go` Hostinfo updates
-2. **Storage**: `db/` persists routes, `NodeStore` caches for performance
-3. **Approval**: `policy/` auto-approves routes based on ACL rules
-4. **Distribution**: `routes/` selects primary routes, `mapper/` distributes to peers
-
-### Command-Line Tools (`cmd/`)
-
-**Main Server (`cmd/headscale/`)**
- `headscale.go`: CLI parsing, configuration loading, server startup
- Supports daemon mode, CLI operations (user/node management), database operations
-
-**Integration Test Runner (`cmd/hi/`)**
- `main.go`: Test execution framework with Docker orchestration
- `run.go`: Individual test execution with artifact collection
- `doctor.go`: System requirements validation
- `docker.go`: Container lifecycle management
- Essential for validating changes against real Tailscale clients
-
-### Generated & External Code
-
-**Protocol Buffers (`proto/` → `gen/`)**
- Defines gRPC API for headscale management operations
- Client libraries can generate from these definitions
- Run `make generate` after modifying `.proto` files
-
-**Integration Testing (`integration/`)**
- `scenario.go`: Docker test environment setup
- `tailscale.go`: Tailscale client container management
- Individual test files for specific functionality areas
- Real end-to-end validation with network isolation
-
-### Critical Performance Paths
-
-**High-Frequency Operations**
-1. **MapRequest Processing** (`poll.go`): Every 15-60 seconds per client
-2. **NodeStore Reads** (`node_store.go`): Every operation requiring node data
-3. **Policy Evaluation** (`policy/`): On every peer relationship calculation
-4. **Route Lookups** (`routes/`): During network map generation
-
-**Database Write Patterns**
- **Frequent**: Node heartbeats, endpoint updates, route changes
- **Moderate**: User operations, policy updates, API key management
- **Rare**: Schema migrations, bulk operations
-
-### Configuration & Deployment
-
-**Configuration** (`hscontrol/types/config.go`)**
- Database connection settings (SQLite/PostgreSQL)
- Network configuration (IP ranges, DNS settings)
- Policy mode (file vs database)
- DERP relay configuration
- OIDC provider settings
-
-**Key Dependencies**
- **GORM**: Database ORM with migration support
- **Tailscale Libraries**: Core networking and protocol code
- **Zerolog**: Structured logging throughout the application
- **Buf**: Protocol buffer toolchain for code generation
-
-### Development Workflow Integration
-
-The architecture supports incremental development:
- **Unit Tests**: Focus on individual packages (`*_test.go` files)
- **Integration Tests**: Validate cross-component interactions
- **Database Tests**: Extensive migration and data integrity validation
- **Policy Tests**: ACL rule evaluation and edge cases
- **Performance Tests**: NodeStore and high-frequency operation validation
-
-## Integration Test System
-
-### Overview
-Integration tests use Docker containers running real Tailscale clients against a Headscale server. Tests validate end-to-end functionality including routing, ACLs, node lifecycle, and network coordination.
-
-### Running Integration Tests
-
-**System Requirements**
-```bash
-# Check if your system is ready
-go run ./cmd/hi doctor
-```
-This verifies Docker, Go, required images, and disk space.
-
-**Test Execution Patterns**
-```bash
-# Run a single test (recommended for development)
-go run ./cmd/hi run "TestSubnetRouterMultiNetwork"
-
-# Run with PostgreSQL backend (for database-heavy tests)
-go run ./cmd/hi run "TestExpireNode" --postgres
-
-# Run multiple tests with pattern matching
-go run ./cmd/hi run "TestSubnet*"
-
-# Run all integration tests (CI/full validation)
-go test ./integration -timeout 30m
-```
-
-**Test Categories & Timing**
- **Fast tests** (< 2 min): Basic functionality, CLI operations
- **Medium tests** (2-5 min): Route management, ACL validation  
- **Slow tests** (5+ min): Node expiration, HA failover
- **Long-running tests** (10+ min): `TestNodeOnlineStatus` (12 min duration)
-
-### Test Infrastructure
-
-**Docker Setup**
- Headscale server container with configurable database backend
- Multiple Tailscale client containers with different versions
- Isolated networks per test scenario
- Automatic cleanup after test completion
-
-**Test Artifacts**
-All test runs save artifacts to `control_logs/TIMESTAMP-ID/`:
-```
-control_logs/20250713-213106-iajsux/
-├── hs-testname-abc123.stderr.log     # Headscale server logs
-├── hs-testname-abc123.stdout.log
-├── hs-testname-abc123.db             # Database snapshot
-├── hs-testname-abc123_metrics.txt    # Prometheus metrics
-├── hs-testname-abc123-mapresponses/  # Protocol debug data
-├── ts-client-xyz789.stderr.log       # Tailscale client logs
-├── ts-client-xyz789.stdout.log
-└── ts-client-xyz789_status.json      # Client status dump
-```
-
-### Test Development Guidelines
-
-**Timing Considerations**
-Integration tests involve real network operations and Docker container lifecycle:
-
-```go
-// ❌ Wrong: Immediate assertions after async operations
-client.Execute([]string{"tailscale", "set", "--advertise-routes=10.0.0.0/24"})
-nodes, _ := headscale.ListNodes()
-require.Len(t, nodes[0].GetAvailableRoutes(), 1) // May fail due to timing
-
-// ✅ Correct: Wait for async operations to complete
-client.Execute([]string{"tailscale", "set", "--advertise-routes=10.0.0.0/24"})
-require.EventuallyWithT(t, func(c *assert.CollectT) {
-    nodes, err := headscale.ListNodes()
-    assert.NoError(c, err)
-    assert.Len(c, nodes[0].GetAvailableRoutes(), 1)
-}, 10*time.Second, 100*time.Millisecond, "route should be advertised")
-```
-
-**Common Test Patterns**
- **Route Advertisement**: Use `EventuallyWithT` for route propagation
- **Node State Changes**: Wait for NodeStore synchronization  
- **ACL Policy Changes**: Allow time for policy recalculation
- **Network Connectivity**: Use ping tests with retries
-
-**Test Data Management**
-```go
-// Node identification: Don't assume array ordering
-expectedRoutes := map[string]string{"1": "10.33.0.0/16"}
-for _, node := range nodes {
-    nodeIDStr := fmt.Sprintf("%d", node.GetId())
-    if route, shouldHaveRoute := expectedRoutes[nodeIDStr]; shouldHaveRoute {
-        // Test the node that should have the route
-    }
-}
-```
-
-### Troubleshooting Integration Tests
-
-**Common Failure Patterns**
-1. **Timing Issues**: Test assertions run before async operations complete
-   - **Solution**: Use `EventuallyWithT` with appropriate timeouts
-   - **Timeout Guidelines**: 3-5s for route operations, 10s for complex scenarios
-
-2. **Infrastructure Problems**: Disk space, Docker issues, network conflicts
-   - **Check**: `go run ./cmd/hi doctor` for system health
-   - **Clean**: Remove old test containers and networks
-
-3. **NodeStore Synchronization**: Tests expecting immediate data availability
-   - **Key Points**: Route advertisements must propagate through poll requests
-   - **Fix**: Wait for NodeStore updates after Hostinfo changes
-
-4. **Database Backend Differences**: SQLite vs PostgreSQL behavior differences
-   - **Use**: `--postgres` flag for database-intensive tests
-   - **Note**: Some timing characteristics differ between backends
-
-**Debugging Failed Tests**
-1. **Check test artifacts** in `control_logs/` for detailed logs
-2. **Examine MapResponse JSON** files for protocol-level debugging
-3. **Review Headscale stderr logs** for server-side error messages
-4. **Check Tailscale client status** for network-level issues
-
-**Resource Management**
- Tests require significant disk space (each run ~100MB of logs)
- Docker containers are cleaned up automatically on success
- Failed tests may leave containers running - clean manually if needed
- Use `docker system prune` periodically to reclaim space
-
-### Best Practices for Test Modifications
-
-1. **Always test locally** before committing integration test changes
-2. **Use appropriate timeouts** - too short causes flaky tests, too long slows CI
-3. **Clean up properly** - ensure tests don't leave persistent state
-4. **Handle both success and failure paths** in test scenarios
-5. **Document timing requirements** for complex test scenarios
-
-## NodeStore Implementation Details
-
-**Key Insight from Recent Work**: The NodeStore is a critical performance optimization that caches node data in memory while ensuring consistency with the database. When working with route advertisements or node state changes:
-
-1. **Timing Considerations**: Route advertisements need time to propagate from clients to server. Use `require.EventuallyWithT()` patterns in tests instead of immediate assertions.
-
-2. **Synchronization Points**: NodeStore updates happen at specific points like `poll.go:420` after Hostinfo changes. Ensure these are maintained when modifying the polling logic.
-
-3. **Peer Visibility**: The NodeStore's `peersFunc` determines which nodes are visible to each other. Policy-based filtering is separate from monitoring visibility - expired nodes should remain visible for debugging but marked as expired.
-
-## Testing Guidelines
-
-### Integration Test Patterns
-```go
-// Use EventuallyWithT for async operations
-require.EventuallyWithT(t, func(c *assert.CollectT) {
-    nodes, err := headscale.ListNodes()
-    assert.NoError(c, err)
-    // Check expected state
-}, 10*time.Second, 100*time.Millisecond, "description")
-
-// Node route checking by actual node properties, not array position
-var routeNode *v1.Node
-for _, node := range nodes {
-    if nodeIDStr := fmt.Sprintf("%d", node.GetId()); expectedRoutes[nodeIDStr] != "" {
-        routeNode = node
-        break
-    }
-}
-```
-
-### Running Problematic Tests
- Some tests require significant time (e.g., `TestNodeOnlineStatus` runs for 12 minutes)
- Infrastructure issues like disk space can cause test failures unrelated to code changes  
- Use `--postgres` flag when testing database-heavy scenarios
-
-## Important Notes
-
- **Dependencies**: Use `nix develop` for consistent toolchain (Go, buf, protobuf tools, linting)
- **Protocol Buffers**: Changes to `proto/` require `make generate` and should be committed separately
- **Code Style**: Enforced via golangci-lint with golines (width 88) and gofumpt formatting
- **Database**: Supports both SQLite (development) and PostgreSQL (production/testing)
- **Integration Tests**: Require Docker and can consume significant disk space
- **Performance**: NodeStore optimizations are critical for scale - be careful with changes to state management
-
-## Debugging Integration Tests
-
-Test artifacts are preserved in `control_logs/TIMESTAMP-ID/` including:
- Headscale server logs (stderr/stdout)
- Tailscale client logs and status
- Database dumps and network captures
- MapResponse JSON files for protocol debugging
-
-When tests fail, check these artifacts first before assuming code issues.
+@AGENTS.md
--- a/CLI_IMPROVEMENT_PLAN.md
+++ b/CLI_IMPROVEMENT_PLAN.md
--- a/CLI_STANDARDIZATION_SUMMARY.md
+++ b/CLI_STANDARDIZATION_SUMMARY.md
@@ -1,201 +0,0 @@
-# CLI Standardization Summary
-
-## Changes Made
-
-### 1. Command Naming Standardization
- **Fixed**: `backfillips` → `backfill-ips` (with backward compat alias)
- **Fixed**: `dumpConfig` → `dump-config` (with backward compat alias) 
- **Result**: All commands now use kebab-case consistently
-
-### 2. Flag Standardization
-
-#### Node Commands
- **Added**: `--node` flag as primary way to specify nodes
- **Deprecated**: `--identifier` flag (hidden, marked deprecated)
- **Backward Compatible**: Both flags work, `--identifier` shows deprecation warning
- **Smart Lookup Ready**: `--node` accepts strings for future name/hostname/IP lookup
-
-#### User Commands  
- **Updated**: User identification flow prepared for `--user` flag
- **Maintained**: Existing `--name` and `--identifier` flags for backward compatibility
-
-### 3. Description Consistency
- **Fixed**: "Api" → "API" throughout
- **Fixed**: Capitalization consistency in short descriptions
- **Fixed**: Removed unnecessary periods from short descriptions
- **Standardized**: "Handle/Manage the X of Headscale" pattern
-
-### 4. Type Consistency
- **Standardized**: Node IDs use `uint64` consistently
- **Maintained**: Backward compatibility with existing flag types
-
-## Current Status
-
-### ✅ Completed
- Command naming (kebab-case)
- Flag deprecation and aliasing
- Description standardization  
- Backward compatibility preservation
- Helper functions for flag processing
- **SMART LOOKUP IMPLEMENTATION**:
-  - Enhanced `ListNodesRequest` proto with ID, name, hostname, IP filters
-  - Implemented smart filtering in `ListNodes` gRPC method
-  - Added CLI smart lookup functions for nodes and users
-  - Single match validation with helpful error messages
-  - Automatic detection: ID (numeric) vs IP vs name/hostname/email
-
-### ✅ Smart Lookup Features
- **Node Lookup**: By ID, hostname, or IP address
- **User Lookup**: By ID, username, or email address  
- **Single Match Enforcement**: Errors if 0 or >1 matches found
- **Helpful Error Messages**: Shows all matches when ambiguous
- **Full Backward Compatibility**: All existing flags still work
- **Enhanced List Commands**: Both `nodes list` and `users list` support all filter types
-
-## Breaking Changes
-
-**None.** All changes maintain full backward compatibility through flag aliases and deprecation warnings.
-
-## Implementation Details
-
-### Smart Lookup Algorithm
-
-1. **Input Detection**:
-   ```go
-   if numeric && > 0 -> treat as ID
-   else if contains "@" -> treat as email (users only)  
-   else if valid IP address -> treat as IP (nodes only)
-   else -> treat as name/hostname
-   ```
-
-2. **gRPC Filtering**:
-   - Uses enhanced `ListNodes`/`ListUsers` with specific filters
-   - Server-side filtering for optimal performance
-   - Single transaction per lookup
-
-3. **Match Validation**:
-   - Exactly 1 match: Return ID
-   - 0 matches: Error with "not found" message
-   - >1 matches: Error listing all matches for disambiguation
-
-### Enhanced Proto Definitions
-
-```protobuf
-message ListNodesRequest { 
-  string user = 1;           // existing
-  uint64 id = 2;            // new: filter by ID
-  string name = 3;          // new: filter by hostname  
-  string hostname = 4;      // new: alias for name
-  repeated string ip_addresses = 5; // new: filter by IPs
-}
-```
-
-### Future Enhancements
-
- **Fuzzy Matching**: Partial name matching with confirmation
- **Recently Used**: Cache recently accessed nodes/users
- **Tab Completion**: Shell completion for names/hostnames
- **Bulk Operations**: Multi-select with pattern matching
-
-## Migration Path for Users
-
-### Now Available (Current Release)
-```bash
-# Old way (still works, shows deprecation warning)
-headscale nodes expire --identifier 123
-
-# New way with smart lookup:
-headscale nodes expire --node 123                    # by ID
-headscale nodes expire --node "my-laptop"           # by hostname  
-headscale nodes expire --node "100.64.0.1"          # by Tailscale IP
-headscale nodes expire --node "192.168.1.100"       # by real IP
-
-# User operations:
-headscale users destroy --user 123                   # by ID
-headscale users destroy --user "alice"               # by username
-headscale users destroy --user "alice@company.com"   # by email
-
-# Enhanced list commands with filtering:
-headscale nodes list --node "laptop"                 # filter nodes by name
-headscale nodes list --ip "100.64.0.1"              # filter nodes by IP
-headscale nodes list --user "alice"                  # filter nodes by user
-headscale users list --user "alice"                  # smart lookup user
-headscale users list --email "@company.com"          # filter by email domain
-headscale users list --name "alice"                  # filter by exact name
-
-# Error handling examples:
-headscale nodes expire --node "laptop"
-# Error: multiple nodes found matching 'laptop': ID=1 name=laptop-alice, ID=2 name=laptop-bob
-
-headscale nodes expire --node "nonexistent" 
-# Error: no node found matching 'nonexistent'
-```
-
-## Command Structure Overview
-
-```
-headscale [global-flags] <command> [command-flags] <subcommand> [subcommand-flags] [args]
-
-Global Flags:
-  --config, -c     config file path
-  --output, -o     output format (json, yaml, json-line)  
-  --force          disable prompts
-
-Commands:
-├── serve
-├── version  
-├── config-test
-├── dump-config (alias: dumpConfig)
-├── mockoidc
-├── generate/
-│   └── private-key
-├── nodes/
-│   ├── list (--user, --tags, --columns)
-│   ├── register (--user, --key) 
-│   ├── list-routes (--node)
-│   ├── expire (--node)
-│   ├── rename (--node) <new-name>
-│   ├── delete (--node)
-│   ├── move (--node, --user)
-│   ├── tag (--node, --tags)
-│   ├── approve-routes (--node, --routes)
-│   └── backfill-ips (alias: backfillips)
-├── users/
-│   ├── create <name> (--display-name, --email, --picture-url)
-│   ├── list (--user, --name, --email, --columns)
-│   ├── destroy (--user|--name|--identifier)
-│   └── rename (--user|--name|--identifier, --new-name)
-├── apikeys/
-│   ├── list
-│   ├── create (--expiration)
-│   ├── expire (--prefix)
-│   └── delete (--prefix)
-├── preauthkeys/
-│   ├── list (--user)
-│   ├── create (--user, --reusable, --ephemeral, --expiration, --tags)
-│   └── expire (--user) <key>
-├── policy/
-│   ├── get
-│   ├── set (--file)
-│   └── check (--file)
-└── debug/
-    └── create-node (--name, --user, --key, --route)
-```
-
-## Deprecated Flags
-
-All deprecated flags continue to work but show warnings:
-
- `--identifier` → use `--node` (for node commands) or `--user` (for user commands)
- `--namespace` → use `--user` (already implemented)
- `dumpConfig` → use `dump-config`
- `backfillips` → use `backfill-ips`
-
-## Error Handling
-
-Improved error messages provide clear guidance:
-```
-Error: node specifier must be a numeric ID (smart lookup by name/hostname/IP not yet implemented)
-Error: --node flag is required  
-Error: --user flag is required
-```
--- a/Dockerfile.derper
+++ b/Dockerfile.derper
@@ -1,6 +1,6 @@
 # For testing purposes only

-FROM golang:alpine AS build-env
+FROM golang:1.26.1-alpine AS build-env

 WORKDIR /go/src

@@ -12,7 +12,7 @@ WORKDIR /go/src/tailscale
 ARG TARGETARCH
 RUN GOARCH=$TARGETARCH go install -v ./cmd/derper

-FROM alpine:3.18
+FROM alpine:3.22
 RUN apk add --no-cache ca-certificates iptables iproute2 ip6tables curl

 COPY --from=build-env /go/bin/* /usr/local/bin/
--- a/Dockerfile.integration
+++ b/Dockerfile.integration
@@ -2,25 +2,43 @@
 # and are in no way endorsed by Headscale's maintainers as an
 # official nor supported release or distribution.

-FROM docker.io/golang:1.24-bookworm
+FROM docker.io/golang:1.26.1-trixie AS builder
 ARG VERSION=dev
 ENV GOPATH /go
 WORKDIR /go/src/headscale

-RUN apt-get update \
-  && apt-get install --no-install-recommends --yes less jq sqlite3 dnsutils \
-  && rm -rf /var/lib/apt/lists/* \
-  && apt-get clean
-RUN mkdir -p /var/run/headscale
+# Install delve debugger first - rarely changes, good cache candidate
+RUN go install github.com/go-delve/delve/cmd/dlv@latest

+# Download dependencies - only invalidated when go.mod/go.sum change
 COPY go.mod go.sum /go/src/headscale/
 RUN go mod download

+# Copy source and build - invalidated on any source change
 COPY . .

-RUN CGO_ENABLED=0 GOOS=linux go install -a ./cmd/headscale && test -e /go/bin/headscale
+# Build debug binary with debug symbols for delve
+RUN CGO_ENABLED=0 GOOS=linux go build -gcflags="all=-N -l" -o /go/bin/headscale ./cmd/headscale
+
+# Runtime stage
+FROM debian:trixie-slim
+
+RUN apt-get --update install --no-install-recommends --yes \
+    bash ca-certificates curl dnsutils findutils iproute2 jq less procps python3 sqlite3 \
+  && apt-get dist-clean
+
+RUN mkdir -p /var/run/headscale
+
+# Copy binaries from builder
+COPY --from=builder /go/bin/headscale /usr/local/bin/headscale
+COPY --from=builder /go/bin/dlv /usr/local/bin/dlv
+
+# Copy source code for delve source-level debugging
+COPY --from=builder /go/src/headscale /go/src/headscale
+
+WORKDIR /go/src/headscale

 # Need to reset the entrypoint or everything will run as a busybox script
 ENTRYPOINT []
-EXPOSE 8080/tcp
-CMD ["headscale"]
+EXPOSE 8080/tcp 40000/tcp
+CMD ["dlv", "--listen=0.0.0.0:40000", "--headless=true", "--api-version=2", "--accept-multiclient", "exec", "/usr/local/bin/headscale", "--"]
--- a/Dockerfile.integration-ci
+++ b/Dockerfile.integration-ci
@@ -0,0 +1,17 @@
+# Minimal CI image - expects pre-built headscale binary in build context
+# For local development with delve debugging, use Dockerfile.integration instead
+
+FROM debian:trixie-slim
+
+RUN apt-get --update install --no-install-recommends --yes \
+    bash ca-certificates curl dnsutils findutils iproute2 jq less procps python3 sqlite3 \
+  && apt-get dist-clean
+
+RUN mkdir -p /var/run/headscale
+
+# Copy pre-built headscale binary from build context
+COPY headscale /usr/local/bin/headscale
+
+ENTRYPOINT []
+EXPOSE 8080/tcp
+CMD ["/usr/local/bin/headscale"]
--- a/Dockerfile.tailscale-HEAD
+++ b/Dockerfile.tailscale-HEAD
@@ -4,7 +4,7 @@
 # This Dockerfile is more or less lifted from tailscale/tailscale
 # to ensure a similar build process when testing the HEAD of tailscale.

-FROM golang:1.24-alpine AS build-env
+FROM golang:1.26.1-alpine AS build-env

 WORKDIR /go/src

@@ -36,8 +36,10 @@ RUN GOARCH=$TARGETARCH go install -tags="${BUILD_TAGS}" -ldflags="\
      -X tailscale.com/version.gitCommitStamp=$VERSION_GIT_HASH" \
      -v ./cmd/tailscale ./cmd/tailscaled ./cmd/containerboot

-FROM alpine:3.18
-RUN apk add --no-cache ca-certificates iptables iproute2 ip6tables curl
+FROM alpine:3.22
+# Upstream: ca-certificates ip6tables iptables iproute2
+# Tests: curl python3 (traceroute via BusyBox)
+RUN apk add --no-cache ca-certificates curl ip6tables iptables iproute2 python3

 COPY --from=build-env /go/bin/* /usr/local/bin/
 # For compat with the previous run.sh, although ideally you should be
--- a/27
+++ b/27
@@ -21,7 +21,7 @@ endef
 # Source file collections using shell find for better performance
 GO_SOURCES := $(shell find . -name '*.go' -not -path './gen/*' -not -path './vendor/*')
 PROTO_SOURCES := $(shell find . -name '*.proto' -not -path './gen/*' -not -path './vendor/*')
-DOC_SOURCES := $(shell find . \( -name '*.md' -o -name '*.yaml' -o -name '*.yml' -o -name '*.ts' -o -name '*.js' -o -name '*.html' -o -name '*.css' -o -name '*.scss' -o -name '*.sass' \) -not -path './gen/*' -not -path './vendor/*' -not -path './node_modules/*')
+PRETTIER_SOURCES := $(shell find . \( -name '*.md' -o -name '*.yaml' -o -name '*.yml' -o -name '*.ts' -o -name '*.js' -o -name '*.html' -o -name '*.css' -o -name '*.scss' -o -name '*.sass' \) -not -path './gen/*' -not -path './vendor/*' -not -path './node_modules/*')

 # Default target
 .PHONY: all
@@ -33,6 +33,7 @@ check-deps:
 	$(call check_tool,go)
 	$(call check_tool,golangci-lint)
 	$(call check_tool,gofumpt)
+	$(call check_tool,mdformat)
 	$(call check_tool,prettier)
 	$(call check_tool,clang-format)
 	$(call check_tool,buf)
@@ -52,7 +53,7 @@ test: check-deps $(GO_SOURCES) go.mod go.sum

 # Formatting targets
 .PHONY: fmt
-fmt: fmt-go fmt-prettier fmt-proto
+fmt: fmt-go fmt-mdformat fmt-prettier fmt-proto

 .PHONY: fmt-go
 fmt-go: check-deps $(GO_SOURCES)
@@ -60,11 +61,15 @@ fmt-go: check-deps $(GO_SOURCES)
 	gofumpt -l -w .
 	golangci-lint run --fix

+.PHONY: fmt-mdformat
+fmt-mdformat: check-deps
+	@echo "Formatting documentation..."
+	mdformat docs/
+
 .PHONY: fmt-prettier
-fmt-prettier: check-deps $(DOC_SOURCES)
-	@echo "Formatting documentation and config files..."
+fmt-prettier: check-deps $(PRETTIER_SOURCES)
+	@echo "Formatting markup and config files..."
 	prettier --write '**/*.{ts,js,md,yaml,yml,sass,css,scss,html}'
-	prettier --write --print-width 80 --prose-wrap always CHANGELOG.md

 .PHONY: fmt-proto
 fmt-proto: check-deps $(PROTO_SOURCES)
@@ -87,10 +92,9 @@ lint-proto: check-deps $(PROTO_SOURCES)

 # Code generation
 .PHONY: generate
-generate: check-deps $(PROTO_SOURCES)
-	@echo "Generating code from Protocol Buffers..."
-	rm -rf gen
-	buf generate proto
+generate: check-deps
+	@echo "Generating code..."
+	go generate ./...

 # Clean targets
 .PHONY: clean
@@ -118,7 +122,8 @@ help:
 	@echo ""
 	@echo "Specific targets:"
 	@echo "  fmt-go       - Format Go code only"
-	@echo "  fmt-prettier - Format documentation only" 
+	@echo "  fmt-mdformat - Format documentation only"
+	@echo "  fmt-prettier - Format markup and config files only"
 	@echo "  fmt-proto    - Format Protocol Buffer files only"
 	@echo "  lint-go      - Lint Go code only"
 	@echo "  lint-proto   - Lint Protocol Buffer files only"
@@ -127,4 +132,4 @@ help:
 	@echo "  check-deps   - Verify required tools are available"
 	@echo ""
 	@echo "Note: If not running in a nix shell, ensure dependencies are available:"
-	@echo "  nix develop"
+	@echo "  nix develop"
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-![headscale logo](./docs/logo/headscale3_header_stacked_left.png)
+![headscale logo](./docs/assets/logo/headscale3_header_stacked_left.png)

 ![ci](https://github.com/juanfont/headscale/actions/workflows/test.yml/badge.svg)

@@ -63,8 +63,18 @@ and container to run Headscale.**

 Please have a look at the [`documentation`](https://headscale.net/stable/).

+For NixOS users, a module is available in [`nix/`](./nix/).
+
+## Builds from `main`
+
+Development builds from the `main` branch are available as container images and
+binaries. See the [development builds](https://headscale.net/stable/setup/install/main/)
+documentation for details.
+
 ## Talks

+- Fosdem 2026 (video): [Headscale & Tailscale: The complementary open source clone](https://fosdem.org/2026/schedule/event/KYQ3LL-headscale-the-complementary-open-source-clone/)
+  - presented by Kristoffer Dalby
 - Fosdem 2023 (video): [Headscale: How we are using integration testing to reimplement Tailscale](https://fosdem.org/2023/schedule/event/goheadscale/)
  - presented by Juan Font Alonso and Kristoffer Dalby

@@ -103,6 +113,8 @@ run `make lint` and `make fmt` before committing any code.
 The **Proto** code is linted with [`buf`](https://docs.buf.build/lint/overview) and
 formatted with [`clang-format`](https://clang.llvm.org/docs/ClangFormat.html).

+The **docs** are formatted with [`mdformat`](https://mdformat.readthedocs.io).
+
 The **rest** (Markdown, YAML, etc) is formatted with [`prettier`](https://prettier.io).

 Check out the `.golangci.yaml` and `Makefile` to see the specific configuration.
@@ -147,6 +159,7 @@ make build
 We recommend using Nix for dependency management to ensure you have all required tools. If you prefer to manage dependencies yourself, you can use Make directly:

 **With Nix (recommended):**
+
 ```shell
 nix develop
 make test
@@ -154,6 +167,7 @@ make build
 ```

 **With your own dependencies:**
+
 ```shell
 make test
 make build
--- a/cmd/headscale/cli/api_key.go
+++ b/cmd/headscale/cli/api_key.go
@@ -4,15 +4,16 @@ import (
 	"context"
 	"fmt"
 	"strconv"
-	"time"

 	v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
 	"github.com/juanfont/headscale/hscontrol/util"
-	"github.com/prometheus/common/model"
 	"github.com/pterm/pterm"
-	"github.com/rs/zerolog/log"
 	"github.com/spf13/cobra"
-	"google.golang.org/protobuf/types/known/timestamppb"
+)
+
+const (
+	// DefaultAPIKeyExpiry is 90 days.
+	DefaultAPIKeyExpiry = "90d"
 )

 func init() {
@@ -25,52 +26,35 @@ func init() {
 	apiKeysCmd.AddCommand(createAPIKeyCmd)

 	expireAPIKeyCmd.Flags().StringP("prefix", "p", "", "ApiKey prefix")
-	if err := expireAPIKeyCmd.MarkFlagRequired("prefix"); err != nil {
-		log.Fatal().Err(err).Msg("")
-	}
+	expireAPIKeyCmd.Flags().Uint64P("id", "i", 0, "ApiKey ID")
 	apiKeysCmd.AddCommand(expireAPIKeyCmd)

 	deleteAPIKeyCmd.Flags().StringP("prefix", "p", "", "ApiKey prefix")
-	if err := deleteAPIKeyCmd.MarkFlagRequired("prefix"); err != nil {
-		log.Fatal().Err(err).Msg("")
-	}
+	deleteAPIKeyCmd.Flags().Uint64P("id", "i", 0, "ApiKey ID")
 	apiKeysCmd.AddCommand(deleteAPIKeyCmd)
 }

 var apiKeysCmd = &cobra.Command{
 	Use:     "apikeys",
-	Short:   "Handle the API keys in Headscale",
+	Short:   "Handle the Api keys in Headscale",
 	Aliases: []string{"apikey", "api"},
 }

 var listAPIKeys = &cobra.Command{
 	Use:     "list",
-	Short:   "List the API keys for Headscale",
+	Short:   "List the Api keys for headscale",
 	Aliases: []string{"ls", "show"},
-	Run: func(cmd *cobra.Command, args []string) {
-		output := GetOutputFlag(cmd)
-
-		err := WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
-			request := &v1.ListApiKeysRequest{}
-
-			response, err := client.ListApiKeys(ctx, request)
-			if err != nil {
-				ErrorOutput(
-					err,
-					fmt.Sprintf("Error getting the list of keys: %s", err),
-					output,
-				)
-				return err
-			}
-
-			if output != "" {
-				SuccessOutput(response.GetApiKeys(), "", output)
-				return nil
-			}
+	RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
+		response, err := client.ListApiKeys(ctx, &v1.ListApiKeysRequest{})
+		if err != nil {
+			return fmt.Errorf("listing api keys: %w", err)
+		}

+		return printListOutput(cmd, response.GetApiKeys(), func() error {
 			tableData := pterm.TableData{
 				{"ID", "Prefix", "Expiration", "Created"},
 			}
+
 			for _, key := range response.GetApiKeys() {
 				expiration := "-"

@@ -84,142 +68,94 @@ var listAPIKeys = &cobra.Command{
 					expiration,
 					key.GetCreatedAt().AsTime().Format(HeadscaleDateTimeFormat),
 				})
+			}

-			}
-			err = pterm.DefaultTable.WithHasHeader().WithData(tableData).Render()
-			if err != nil {
-				ErrorOutput(
-					err,
-					fmt.Sprintf("Failed to render pterm table: %s", err),
-					output,
-				)
-				return err
-			}
-			return nil
+			return pterm.DefaultTable.WithHasHeader().WithData(tableData).Render()
 		})
-		if err != nil {
-			return
-		}
-	},
+	}),
 }

 var createAPIKeyCmd = &cobra.Command{
 	Use:   "create",
-	Short: "Create a new API key",
+	Short: "Creates a new Api key",
 	Long: `
 Creates a new Api key, the Api key is only visible on creation
 and cannot be retrieved again.
 If you loose a key, create a new one and revoke (expire) the old one.`,
 	Aliases: []string{"c", "new"},
-	Run: func(cmd *cobra.Command, args []string) {
-		output := GetOutputFlag(cmd)
-
-		request := &v1.CreateApiKeyRequest{}
-
-		durationStr, _ := cmd.Flags().GetString("expiration")
-
-		duration, err := model.ParseDuration(durationStr)
+	RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
+		expiration, err := expirationFromFlag(cmd)
 		if err != nil {
-			ErrorOutput(
-				err,
-				fmt.Sprintf("Could not parse duration: %s\n", err),
-				output,
-			)
-			return
+			return err
 		}

-		expiration := time.Now().UTC().Add(time.Duration(duration))
-
-		request.Expiration = timestamppb.New(expiration)
-
-		err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
-			response, err := client.CreateApiKey(ctx, request)
-			if err != nil {
-				ErrorOutput(
-					err,
-					fmt.Sprintf("Cannot create Api Key: %s\n", err),
-					output,
-				)
-				return err
-			}
-
-			SuccessOutput(response.GetApiKey(), response.GetApiKey(), output)
-			return nil
+		response, err := client.CreateApiKey(ctx, &v1.CreateApiKeyRequest{
+			Expiration: expiration,
 		})
 		if err != nil {
-			return
+			return fmt.Errorf("creating api key: %w", err)
 		}
-	},
+
+		return printOutput(cmd, response.GetApiKey(), response.GetApiKey())
+	}),
+}
+
+// apiKeyIDOrPrefix reads --id and --prefix from cmd and validates that
+// exactly one is provided.
+func apiKeyIDOrPrefix(cmd *cobra.Command) (uint64, string, error) {
+	id, _ := cmd.Flags().GetUint64("id")
+	prefix, _ := cmd.Flags().GetString("prefix")
+
+	switch {
+	case id == 0 && prefix == "":
+		return 0, "", fmt.Errorf("either --id or --prefix must be provided: %w", errMissingParameter)
+	case id != 0 && prefix != "":
+		return 0, "", fmt.Errorf("only one of --id or --prefix can be provided: %w", errMissingParameter)
+	}
+
+	return id, prefix, nil
 }

 var expireAPIKeyCmd = &cobra.Command{
 	Use:     "expire",
-	Short:   "Expire an API key",
+	Short:   "Expire an ApiKey",
 	Aliases: []string{"revoke", "exp", "e"},
-	Run: func(cmd *cobra.Command, args []string) {
-		output := GetOutputFlag(cmd)
-		prefix, err := cmd.Flags().GetString("prefix")
+	RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
+		id, prefix, err := apiKeyIDOrPrefix(cmd)
 		if err != nil {
-			ErrorOutput(err, fmt.Sprintf("Error getting prefix from CLI flag: %s", err), output)
-			return
+			return err
 		}

-		err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
-			request := &v1.ExpireApiKeyRequest{
-				Prefix: prefix,
-			}
-
-			response, err := client.ExpireApiKey(ctx, request)
-			if err != nil {
-				ErrorOutput(
-					err,
-					fmt.Sprintf("Cannot expire Api Key: %s\n", err),
-					output,
-				)
-				return err
-			}
-
-			SuccessOutput(response, "Key expired", output)
-			return nil
+		response, err := client.ExpireApiKey(ctx, &v1.ExpireApiKeyRequest{
+			Id:     id,
+			Prefix: prefix,
 		})
 		if err != nil {
-			return
+			return fmt.Errorf("expiring api key: %w", err)
 		}
-	},
+
+		return printOutput(cmd, response, "Key expired")
+	}),
 }

 var deleteAPIKeyCmd = &cobra.Command{
 	Use:     "delete",
-	Short:   "Delete an API key",
+	Short:   "Delete an ApiKey",
 	Aliases: []string{"remove", "del"},
-	Run: func(cmd *cobra.Command, args []string) {
-		output := GetOutputFlag(cmd)
-		prefix, err := cmd.Flags().GetString("prefix")
+	RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
+		id, prefix, err := apiKeyIDOrPrefix(cmd)
 		if err != nil {
-			ErrorOutput(err, fmt.Sprintf("Error getting prefix from CLI flag: %s", err), output)
-			return
+			return err
 		}

-		err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
-			request := &v1.DeleteApiKeyRequest{
-				Prefix: prefix,
-			}
-
-			response, err := client.DeleteApiKey(ctx, request)
-			if err != nil {
-				ErrorOutput(
-					err,
-					fmt.Sprintf("Cannot delete Api Key: %s\n", err),
-					output,
-				)
-				return err
-			}
-
-			SuccessOutput(response, "Key deleted", output)
-			return nil
+		response, err := client.DeleteApiKey(ctx, &v1.DeleteApiKeyRequest{
+			Id:     id,
+			Prefix: prefix,
 		})
 		if err != nil {
-			return
+			return fmt.Errorf("deleting api key: %w", err)
 		}
-	},
+
+		return printOutput(cmd, response, "Key deleted")
+	}),
 }
--- a/cmd/headscale/cli/auth.go
+++ b/cmd/headscale/cli/auth.go
@@ -0,0 +1,93 @@
+package cli
+
+import (
+	"context"
+	"fmt"
+
+	v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
+	"github.com/spf13/cobra"
+)
+
+func init() {
+	rootCmd.AddCommand(authCmd)
+
+	authRegisterCmd.Flags().StringP("user", "u", "", "User")
+	authRegisterCmd.Flags().String("auth-id", "", "Auth ID")
+	mustMarkRequired(authRegisterCmd, "user", "auth-id")
+	authCmd.AddCommand(authRegisterCmd)
+
+	authApproveCmd.Flags().String("auth-id", "", "Auth ID")
+	mustMarkRequired(authApproveCmd, "auth-id")
+	authCmd.AddCommand(authApproveCmd)
+
+	authRejectCmd.Flags().String("auth-id", "", "Auth ID")
+	mustMarkRequired(authRejectCmd, "auth-id")
+	authCmd.AddCommand(authRejectCmd)
+}
+
+var authCmd = &cobra.Command{
+	Use:   "auth",
+	Short: "Manage node authentication and approval",
+}
+
+var authRegisterCmd = &cobra.Command{
+	Use:   "register",
+	Short: "Register a node to your network",
+	RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
+		user, _ := cmd.Flags().GetString("user")
+		authID, _ := cmd.Flags().GetString("auth-id")
+
+		request := &v1.AuthRegisterRequest{
+			AuthId: authID,
+			User:   user,
+		}
+
+		response, err := client.AuthRegister(ctx, request)
+		if err != nil {
+			return fmt.Errorf("registering node: %w", err)
+		}
+
+		return printOutput(
+			cmd,
+			response.GetNode(),
+			fmt.Sprintf("Node %s registered", response.GetNode().GetGivenName()))
+	}),
+}
+
+var authApproveCmd = &cobra.Command{
+	Use:   "approve",
+	Short: "Approve a pending authentication request",
+	RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
+		authID, _ := cmd.Flags().GetString("auth-id")
+
+		request := &v1.AuthApproveRequest{
+			AuthId: authID,
+		}
+
+		response, err := client.AuthApprove(ctx, request)
+		if err != nil {
+			return fmt.Errorf("approving auth request: %w", err)
+		}
+
+		return printOutput(cmd, response, "Auth request approved")
+	}),
+}
+
+var authRejectCmd = &cobra.Command{
+	Use:   "reject",
+	Short: "Reject a pending authentication request",
+	RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
+		authID, _ := cmd.Flags().GetString("auth-id")
+
+		request := &v1.AuthRejectRequest{
+			AuthId: authID,
+		}
+
+		response, err := client.AuthReject(ctx, request)
+		if err != nil {
+			return fmt.Errorf("rejecting auth request: %w", err)
+		}
+
+		return printOutput(cmd, response, "Auth request rejected")
+	}),
+}
--- a/cmd/headscale/cli/client.go
+++ b/cmd/headscale/cli/client.go
@@ -1,16 +0,0 @@
-package cli
-
-import (
-	"context"
-
-	v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
-)
-
-// WithClient handles gRPC client setup and cleanup, calls fn with client and context
-func WithClient(fn func(context.Context, v1.HeadscaleServiceClient) error) error {
-	ctx, client, conn, cancel := newHeadscaleCLIWithConfig()
-	defer cancel()
-	defer conn.Close()
-
-	return fn(ctx, client)
-}
--- a/cmd/headscale/cli/configtest.go
+++ b/cmd/headscale/cli/configtest.go
@@ -1,7 +1,8 @@
 package cli

 import (
-	"github.com/rs/zerolog/log"
+	"fmt"
+
 	"github.com/spf13/cobra"
 )

@@ -11,12 +12,14 @@ func init() {

 var configTestCmd = &cobra.Command{
 	Use:   "configtest",
-	Short: "Test the configuration",
-	Long:  "Run a test of the configuration and exit",
-	Run: func(cmd *cobra.Command, args []string) {
+	Short: "Test the configuration.",
+	Long:  "Run a test of the configuration and exit.",
+	RunE: func(cmd *cobra.Command, args []string) error {
 		_, err := newHeadscaleServerWithConfig()
 		if err != nil {
-			log.Fatal().Caller().Err(err).Msg("Error initializing")
+			return fmt.Errorf("configuration error: %w", err)
 		}
+
+		return nil
 	},
 }
--- a/cmd/headscale/cli/configtest_test.go
+++ b/cmd/headscale/cli/configtest_test.go
@@ -1,46 +0,0 @@
-package cli
-
-import (
-	"testing"
-
-	"github.com/stretchr/testify/assert"
-	"github.com/stretchr/testify/require"
-)
-
-func TestConfigTestCommand(t *testing.T) {
-	// Test that the configtest command exists and is properly configured
-	assert.NotNil(t, configTestCmd)
-	assert.Equal(t, "configtest", configTestCmd.Use)
-	assert.Equal(t, "Test the configuration.", configTestCmd.Short)
-	assert.Equal(t, "Run a test of the configuration and exit.", configTestCmd.Long)
-	assert.NotNil(t, configTestCmd.Run)
-}
-
-func TestConfigTestCommandInRootCommand(t *testing.T) {
-	// Test that configtest is available as a subcommand of root
-	cmd, _, err := rootCmd.Find([]string{"configtest"})
-	require.NoError(t, err)
-	assert.Equal(t, "configtest", cmd.Name())
-	assert.Equal(t, configTestCmd, cmd)
-}
-
-func TestConfigTestCommandHelp(t *testing.T) {
-	// Test that the command has proper help text
-	assert.NotEmpty(t, configTestCmd.Short)
-	assert.NotEmpty(t, configTestCmd.Long)
-	assert.Contains(t, configTestCmd.Short, "configuration")
-	assert.Contains(t, configTestCmd.Long, "test")
-	assert.Contains(t, configTestCmd.Long, "configuration")
-}
-
-// Note: We can't easily test the actual execution of configtest because:
-// 1. It depends on configuration files being present
-// 2. It calls log.Fatal() which would exit the test process
-// 3. It tries to initialize a full Headscale server
-//
-// In a real refactor, we would:
-// 1. Extract the configuration validation logic to a testable function
-// 2. Return errors instead of calling log.Fatal()
-// 3. Accept configuration as a parameter instead of loading from global state
-//
-// For now, we test the command structure and that it's properly wired up.
--- a/cmd/headscale/cli/debug.go
+++ b/cmd/headscale/cli/debug.go
@@ -6,34 +6,17 @@ import (

 	v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
 	"github.com/juanfont/headscale/hscontrol/types"
-	"github.com/rs/zerolog/log"
 	"github.com/spf13/cobra"
-	"google.golang.org/grpc/status"
-)
-
-const (
-	errPreAuthKeyMalformed = Error("key is malformed. expected 64 hex characters with `nodekey` prefix")
 )

 func init() {
 	rootCmd.AddCommand(debugCmd)

 	createNodeCmd.Flags().StringP("name", "", "", "Name")
-	err := createNodeCmd.MarkFlagRequired("name")
-	if err != nil {
-		log.Fatal().Err(err).Msg("")
-	}
 	createNodeCmd.Flags().StringP("user", "u", "", "User")
-
-	err = createNodeCmd.MarkFlagRequired("user")
-	if err != nil {
-		log.Fatal().Err(err).Msg("")
-	}
 	createNodeCmd.Flags().StringP("key", "k", "", "Key")
-	err = createNodeCmd.MarkFlagRequired("key")
-	if err != nil {
-		log.Fatal().Err(err).Msg("")
-	}
+	mustMarkRequired(createNodeCmd, "name", "user", "key")
+
 	createNodeCmd.Flags().
 		StringSliceP("route", "r", []string{}, "List (or repeated flags) of routes to advertise")

@@ -48,79 +31,31 @@ var debugCmd = &cobra.Command{

 var createNodeCmd = &cobra.Command{
 	Use:   "create-node",
-	Short: "Create a node that can be registered with `nodes register <>` command",
-	Run: func(cmd *cobra.Command, args []string) {
-		output := GetOutputFlag(cmd)
+	Short: "Create a node that can be registered with `auth register <>` command",
+	RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
+		user, _ := cmd.Flags().GetString("user")
+		name, _ := cmd.Flags().GetString("name")
+		registrationID, _ := cmd.Flags().GetString("key")

-		user, err := cmd.Flags().GetString("user")
+		_, err := types.AuthIDFromString(registrationID)
 		if err != nil {
-			ErrorOutput(err, fmt.Sprintf("Error getting user: %s", err), output)
-			return
+			return fmt.Errorf("parsing machine key: %w", err)
 		}

-		name, err := cmd.Flags().GetString("name")
-		if err != nil {
-			ErrorOutput(
-				err,
-				fmt.Sprintf("Error getting node from flag: %s", err),
-				output,
-			)
-			return
+		routes, _ := cmd.Flags().GetStringSlice("route")
+
+		request := &v1.DebugCreateNodeRequest{
+			Key:    registrationID,
+			Name:   name,
+			User:   user,
+			Routes: routes,
 		}

-		registrationID, err := cmd.Flags().GetString("key")
+		response, err := client.DebugCreateNode(ctx, request)
 		if err != nil {
-			ErrorOutput(
-				err,
-				fmt.Sprintf("Error getting key from flag: %s", err),
-				output,
-			)
-			return
+			return fmt.Errorf("creating node: %w", err)
 		}

-		_, err = types.RegistrationIDFromString(registrationID)
-		if err != nil {
-			ErrorOutput(
-				err,
-				fmt.Sprintf("Failed to parse machine key from flag: %s", err),
-				output,
-			)
-			return
-		}
-
-		routes, err := cmd.Flags().GetStringSlice("route")
-		if err != nil {
-			ErrorOutput(
-				err,
-				fmt.Sprintf("Error getting routes from flag: %s", err),
-				output,
-			)
-			return
-		}
-
-		err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
-			request := &v1.DebugCreateNodeRequest{
-				Key:    registrationID,
-				Name:   name,
-				User:   user,
-				Routes: routes,
-			}
-
-			response, err := client.DebugCreateNode(ctx, request)
-			if err != nil {
-				ErrorOutput(
-					err,
-					"Cannot create node: "+status.Convert(err).Message(),
-					output,
-				)
-				return err
-			}
-
-			SuccessOutput(response.GetNode(), "Node created", output)
-			return nil
-		})
-		if err != nil {
-			return
-		}
-	},
+		return printOutput(cmd, response.GetNode(), "Node created")
+	}),
 }
--- a/cmd/headscale/cli/debug_test.go
+++ b/cmd/headscale/cli/debug_test.go
@@ -1,144 +0,0 @@
-package cli
-
-import (
-	"testing"
-
-	"github.com/stretchr/testify/assert"
-	"github.com/stretchr/testify/require"
-)
-
-func TestDebugCommand(t *testing.T) {
-	// Test that the debug command exists and is properly configured
-	assert.NotNil(t, debugCmd)
-	assert.Equal(t, "debug", debugCmd.Use)
-	assert.Equal(t, "debug and testing commands", debugCmd.Short)
-	assert.Equal(t, "debug contains extra commands used for debugging and testing headscale", debugCmd.Long)
-}
-
-func TestDebugCommandInRootCommand(t *testing.T) {
-	// Test that debug is available as a subcommand of root
-	cmd, _, err := rootCmd.Find([]string{"debug"})
-	require.NoError(t, err)
-	assert.Equal(t, "debug", cmd.Name())
-	assert.Equal(t, debugCmd, cmd)
-}
-
-func TestCreateNodeCommand(t *testing.T) {
-	// Test that the create-node command exists and is properly configured
-	assert.NotNil(t, createNodeCmd)
-	assert.Equal(t, "create-node", createNodeCmd.Use)
-	assert.Equal(t, "Create a node that can be registered with `nodes register <>` command", createNodeCmd.Short)
-	assert.NotNil(t, createNodeCmd.Run)
-}
-
-func TestCreateNodeCommandInDebugCommand(t *testing.T) {
-	// Test that create-node is available as a subcommand of debug
-	cmd, _, err := rootCmd.Find([]string{"debug", "create-node"})
-	require.NoError(t, err)
-	assert.Equal(t, "create-node", cmd.Name())
-	assert.Equal(t, createNodeCmd, cmd)
-}
-
-func TestCreateNodeCommandFlags(t *testing.T) {
-	// Test that create-node has the required flags
-
-	// Test name flag
-	nameFlag := createNodeCmd.Flags().Lookup("name")
-	assert.NotNil(t, nameFlag)
-	assert.Equal(t, "", nameFlag.Shorthand) // No shorthand for name
-	assert.Equal(t, "", nameFlag.DefValue)
-
-	// Test user flag
-	userFlag := createNodeCmd.Flags().Lookup("user")
-	assert.NotNil(t, userFlag)
-	assert.Equal(t, "u", userFlag.Shorthand)
-
-	// Test key flag
-	keyFlag := createNodeCmd.Flags().Lookup("key")
-	assert.NotNil(t, keyFlag)
-	assert.Equal(t, "k", keyFlag.Shorthand)
-
-	// Test route flag
-	routeFlag := createNodeCmd.Flags().Lookup("route")
-	assert.NotNil(t, routeFlag)
-	assert.Equal(t, "r", routeFlag.Shorthand)
-
-}
-
-func TestCreateNodeCommandRequiredFlags(t *testing.T) {
-	// Test that required flags are marked as required
-	// We can't easily test the actual requirement enforcement without executing the command
-	// But we can test that the flags exist and have the expected properties
-
-	// These flags should be required based on the init() function
-	requiredFlags := []string{"name", "user", "key"}
-
-	for _, flagName := range requiredFlags {
-		flag := createNodeCmd.Flags().Lookup(flagName)
-		assert.NotNil(t, flag, "Required flag %s should exist", flagName)
-	}
-}
-
-func TestErrorType(t *testing.T) {
-	// Test the Error type implementation
-	err := errPreAuthKeyMalformed
-	assert.Equal(t, "key is malformed. expected 64 hex characters with `nodekey` prefix", err.Error())
-	assert.Equal(t, "key is malformed. expected 64 hex characters with `nodekey` prefix", string(err))
-
-	// Test that it implements the error interface
-	var genericErr error = err
-	assert.Equal(t, "key is malformed. expected 64 hex characters with `nodekey` prefix", genericErr.Error())
-}
-
-func TestErrorConstants(t *testing.T) {
-	// Test that error constants are defined properly
-	assert.Equal(t, Error("key is malformed. expected 64 hex characters with `nodekey` prefix"), errPreAuthKeyMalformed)
-}
-
-func TestDebugCommandStructure(t *testing.T) {
-	// Test that debug has create-node as a subcommand
-	found := false
-	for _, subcmd := range debugCmd.Commands() {
-		if subcmd.Name() == "create-node" {
-			found = true
-			break
-		}
-	}
-	assert.True(t, found, "create-node should be a subcommand of debug")
-}
-
-func TestCreateNodeCommandHelp(t *testing.T) {
-	// Test that the command has proper help text
-	assert.NotEmpty(t, createNodeCmd.Short)
-	assert.Contains(t, createNodeCmd.Short, "Create a node")
-	assert.Contains(t, createNodeCmd.Short, "nodes register")
-}
-
-func TestCreateNodeCommandFlagDescriptions(t *testing.T) {
-	// Test that flags have appropriate usage descriptions
-	nameFlag := createNodeCmd.Flags().Lookup("name")
-	assert.Equal(t, "Name", nameFlag.Usage)
-
-	userFlag := createNodeCmd.Flags().Lookup("user")
-	assert.Equal(t, "User", userFlag.Usage)
-
-	keyFlag := createNodeCmd.Flags().Lookup("key")
-	assert.Equal(t, "Key", keyFlag.Usage)
-
-	routeFlag := createNodeCmd.Flags().Lookup("route")
-	assert.Contains(t, routeFlag.Usage, "routes to advertise")
-
-}
-
-// Note: We can't easily test the actual execution of create-node because:
-// 1. It depends on gRPC client configuration
-// 2. It calls SuccessOutput/ErrorOutput which exit the process
-// 3. It requires valid registration keys and user setup
-//
-// In a real refactor, we would:
-// 1. Extract the business logic to testable functions
-// 2. Use dependency injection for the gRPC client
-// 3. Return errors instead of calling ErrorOutput/SuccessOutput
-// 4. Add validation functions that can be tested independently
-//
-// For now, we test the command structure and flag configuration.
--- a/cmd/headscale/cli/dump_config.go
+++ b/cmd/headscale/cli/dump_config.go
@@ -12,18 +12,15 @@ func init() {
 }

 var dumpConfigCmd = &cobra.Command{
-	Use:     "dump-config",
-	Short:   "Dump current config to /etc/headscale/config.dump.yaml, integration test only",
-	Aliases: []string{"dumpConfig"},
-	Hidden:  true,
-	Args: func(cmd *cobra.Command, args []string) error {
-		return nil
-	},
-	Run: func(cmd *cobra.Command, args []string) {
+	Use:    "dumpConfig",
+	Short:  "dump current config to /etc/headscale/config.dump.yaml, integration test only",
+	Hidden: true,
+	RunE: func(cmd *cobra.Command, args []string) error {
 		err := viper.WriteConfigAs("/etc/headscale/config.dump.yaml")
 		if err != nil {
-			//nolint
-			fmt.Println("Failed to dump config")
+			return fmt.Errorf("dumping config: %w", err)
 		}
+
+		return nil
 	},
 }
--- a/cmd/headscale/cli/generate.go
+++ b/cmd/headscale/cli/generate.go
@@ -21,22 +21,17 @@ var generateCmd = &cobra.Command{
 var generatePrivateKeyCmd = &cobra.Command{
 	Use:   "private-key",
 	Short: "Generate a private key for the headscale server",
-	Run: func(cmd *cobra.Command, args []string) {
-		output := GetOutputFlag(cmd)
+	RunE: func(cmd *cobra.Command, args []string) error {
 		machineKey := key.NewMachine()

 		machineKeyStr, err := machineKey.MarshalText()
 		if err != nil {
-			ErrorOutput(
-				err,
-				fmt.Sprintf("Error getting machine key from flag: %s", err),
-				output,
-			)
+			return fmt.Errorf("marshalling machine key: %w", err)
 		}

-		SuccessOutput(map[string]string{
+		return printOutput(cmd, map[string]string{
 			"private_key": string(machineKeyStr),
 		},
-			string(machineKeyStr), output)
+			string(machineKeyStr))
 	},
 }
--- a/cmd/headscale/cli/generate_test.go
+++ b/cmd/headscale/cli/generate_test.go
@@ -1,230 +0,0 @@
-package cli
-
-import (
-	"bytes"
-	"encoding/json"
-	"strings"
-	"testing"
-
-	"github.com/spf13/cobra"
-	"github.com/stretchr/testify/assert"
-	"github.com/stretchr/testify/require"
-	"gopkg.in/yaml.v3"
-)
-
-func TestGenerateCommand(t *testing.T) {
-	// Test that the generate command exists and shows help
-	cmd := &cobra.Command{
-		Use:   "headscale",
-		Short: "headscale - a Tailscale control server",
-	}
-
-	cmd.AddCommand(generateCmd)
-
-	out := new(bytes.Buffer)
-	cmd.SetOut(out)
-	cmd.SetErr(out)
-	cmd.SetArgs([]string{"generate", "--help"})
-
-	err := cmd.Execute()
-	require.NoError(t, err)
-
-	outStr := out.String()
-	assert.Contains(t, outStr, "Generate commands")
-	assert.Contains(t, outStr, "private-key")
-	assert.Contains(t, outStr, "Aliases:")
-	assert.Contains(t, outStr, "gen")
-}
-
-func TestGenerateCommandAlias(t *testing.T) {
-	// Test that the "gen" alias works
-	cmd := &cobra.Command{
-		Use:   "headscale",
-		Short: "headscale - a Tailscale control server",
-	}
-
-	cmd.AddCommand(generateCmd)
-
-	out := new(bytes.Buffer)
-	cmd.SetOut(out)
-	cmd.SetErr(out)
-	cmd.SetArgs([]string{"gen", "--help"})
-
-	err := cmd.Execute()
-	require.NoError(t, err)
-
-	outStr := out.String()
-	assert.Contains(t, outStr, "Generate commands")
-}
-
-func TestGeneratePrivateKeyCommand(t *testing.T) {
-	tests := []struct {
-		name       string
-		args       []string
-		expectJSON bool
-		expectYAML bool
-	}{
-		{
-			name:       "default output",
-			args:       []string{"generate", "private-key"},
-			expectJSON: false,
-			expectYAML: false,
-		},
-		{
-			name:       "json output",
-			args:       []string{"generate", "private-key", "--output", "json"},
-			expectJSON: true,
-			expectYAML: false,
-		},
-		{
-			name:       "yaml output",
-			args:       []string{"generate", "private-key", "--output", "yaml"},
-			expectJSON: false,
-			expectYAML: true,
-		},
-	}
-
-	for _, tt := range tests {
-		t.Run(tt.name, func(t *testing.T) {
-			// Note: This command calls SuccessOutput which exits the process
-			// We can't test the actual execution easily without mocking
-			// Instead, we test the command structure and that it exists
-
-			cmd := &cobra.Command{
-				Use:   "headscale",
-				Short: "headscale - a Tailscale control server",
-			}
-
-			cmd.AddCommand(generateCmd)
-			cmd.PersistentFlags().StringP("output", "o", "", "Output format")
-
-			// Test that the command exists and can be found
-			privateKeyCmd, _, err := cmd.Find([]string{"generate", "private-key"})
-			require.NoError(t, err)
-			assert.Equal(t, "private-key", privateKeyCmd.Name())
-			assert.Equal(t, "Generate a private key for the headscale server", privateKeyCmd.Short)
-		})
-	}
-}
-
-func TestGeneratePrivateKeyHelp(t *testing.T) {
-	cmd := &cobra.Command{
-		Use:   "headscale",
-		Short: "headscale - a Tailscale control server",
-	}
-
-	cmd.AddCommand(generateCmd)
-
-	out := new(bytes.Buffer)
-	cmd.SetOut(out)
-	cmd.SetErr(out)
-	cmd.SetArgs([]string{"generate", "private-key", "--help"})
-
-	err := cmd.Execute()
-	require.NoError(t, err)
-
-	outStr := out.String()
-	assert.Contains(t, outStr, "Generate a private key for the headscale server")
-	assert.Contains(t, outStr, "Usage:")
-}
-
-// Test the key generation logic in isolation (without SuccessOutput/ErrorOutput)
-func TestPrivateKeyGeneration(t *testing.T) {
-	// We can't easily test the full command because it calls SuccessOutput which exits
-	// But we can test that the key generation produces valid output format
-
-	// This is testing the core logic that would be in the command
-	// In a real refactor, we'd extract this to a testable function
-
-	// For now, we can test that the command structure is correct
-	assert.NotNil(t, generatePrivateKeyCmd)
-	assert.Equal(t, "private-key", generatePrivateKeyCmd.Use)
-	assert.Equal(t, "Generate a private key for the headscale server", generatePrivateKeyCmd.Short)
-	assert.NotNil(t, generatePrivateKeyCmd.Run)
-}
-
-func TestGenerateCommandStructure(t *testing.T) {
-	// Test the command hierarchy
-	assert.Equal(t, "generate", generateCmd.Use)
-	assert.Equal(t, "Generate commands", generateCmd.Short)
-	assert.Contains(t, generateCmd.Aliases, "gen")
-
-	// Test that private-key is a subcommand
-	found := false
-	for _, subcmd := range generateCmd.Commands() {
-		if subcmd.Name() == "private-key" {
-			found = true
-			break
-		}
-	}
-	assert.True(t, found, "private-key should be a subcommand of generate")
-}
-
-// Helper function to test output formats (would be used if we refactored the command)
-func validatePrivateKeyOutput(t *testing.T, output string, format string) {
-	switch format {
-	case "json":
-		var result map[string]interface{}
-		err := json.Unmarshal([]byte(output), &result)
-		require.NoError(t, err, "Output should be valid JSON")
-
-		privateKey, exists := result["private_key"]
-		require.True(t, exists, "JSON should contain private_key field")
-
-		keyStr, ok := privateKey.(string)
-		require.True(t, ok, "private_key should be a string")
-		require.NotEmpty(t, keyStr, "private_key should not be empty")
-
-		// Basic validation that it looks like a machine key
-		assert.True(t, strings.HasPrefix(keyStr, "mkey:"), "Machine key should start with mkey:")
-
-	case "yaml":
-		var result map[string]interface{}
-		err := yaml.Unmarshal([]byte(output), &result)
-		require.NoError(t, err, "Output should be valid YAML")
-
-		privateKey, exists := result["private_key"]
-		require.True(t, exists, "YAML should contain private_key field")
-
-		keyStr, ok := privateKey.(string)
-		require.True(t, ok, "private_key should be a string")
-		require.NotEmpty(t, keyStr, "private_key should not be empty")
-
-		assert.True(t, strings.HasPrefix(keyStr, "mkey:"), "Machine key should start with mkey:")
-
-	default:
-		// Default format should just be the key itself
-		assert.True(t, strings.HasPrefix(output, "mkey:"), "Default output should be the machine key")
-		assert.NotContains(t, output, "{", "Default output should not contain JSON")
-		assert.NotContains(t, output, "private_key:", "Default output should not contain YAML structure")
-	}
-}
-
-func TestPrivateKeyOutputFormats(t *testing.T) {
-	// Test cases for different output formats
-	// These test the validation logic we would use after refactoring
-
-	tests := []struct {
-		format string
-		sample string
-	}{
-		{
-			format: "json",
-			sample: `{"private_key": "mkey:abcd1234567890abcd1234567890abcd1234567890abcd1234567890abcd1234"}`,
-		},
-		{
-			format: "yaml",
-			sample: "private_key: mkey:abcd1234567890abcd1234567890abcd1234567890abcd1234567890abcd1234\n",
-		},
-		{
-			format: "",
-			sample: "mkey:abcd1234567890abcd1234567890abcd1234567890abcd1234567890abcd1234",
-		},
-	}
-
-	for _, tt := range tests {
-		t.Run("format_"+tt.format, func(t *testing.T) {
-			validatePrivateKeyOutput(t, tt.sample, tt.format)
-		})
-	}
-}
--- a/cmd/headscale/cli/health.go
+++ b/cmd/headscale/cli/health.go
@@ -0,0 +1,27 @@
+package cli
+
+import (
+	"context"
+	"fmt"
+
+	v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
+	"github.com/spf13/cobra"
+)
+
+func init() {
+	rootCmd.AddCommand(healthCmd)
+}
+
+var healthCmd = &cobra.Command{
+	Use:   "health",
+	Short: "Check the health of the Headscale server",
+	Long:  "Check the health of the Headscale server. This command will return an exit code of 0 if the server is healthy, or 1 if it is not.",
+	RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
+		response, err := client.Health(ctx, &v1.HealthRequest{})
+		if err != nil {
+			return fmt.Errorf("checking health: %w", err)
+		}
+
+		return printOutput(cmd, response, "")
+	}),
+}
--- a/cmd/headscale/cli/mockoidc.go
+++ b/cmd/headscale/cli/mockoidc.go
@@ -1,8 +1,8 @@
 package cli

 import (
+	"context"
 	"encoding/json"
-	"errors"
 	"fmt"
 	"net"
 	"net/http"
@@ -10,6 +10,7 @@ import (
 	"strconv"
 	"time"

+	"github.com/juanfont/headscale/hscontrol/util/zlog/zf"
 	"github.com/oauth2-proxy/mockoidc"
 	"github.com/rs/zerolog/log"
 	"github.com/spf13/cobra"
@@ -24,6 +25,7 @@ const (
 	errMockOidcClientIDNotDefined     = Error("MOCKOIDC_CLIENT_ID not defined")
 	errMockOidcClientSecretNotDefined = Error("MOCKOIDC_CLIENT_SECRET not defined")
 	errMockOidcPortNotDefined         = Error("MOCKOIDC_PORT not defined")
+	errMockOidcUsersNotDefined        = Error("MOCKOIDC_USERS not defined")
 	refreshTTL                        = 60 * time.Minute
 )

@@ -37,12 +39,13 @@ var mockOidcCmd = &cobra.Command{
 	Use:   "mockoidc",
 	Short: "Runs a mock OIDC server for testing",
 	Long:  "This internal command runs a OpenID Connect for testing purposes",
-	Run: func(cmd *cobra.Command, args []string) {
+	RunE: func(cmd *cobra.Command, args []string) error {
 		err := mockOIDC()
 		if err != nil {
-			log.Error().Err(err).Msgf("Error running mock OIDC server")
-			os.Exit(1)
+			return fmt.Errorf("running mock OIDC server: %w", err)
 		}
+
+		return nil
 	},
 }

@@ -51,41 +54,47 @@ func mockOIDC() error {
 	if clientID == "" {
 		return errMockOidcClientIDNotDefined
 	}
+
 	clientSecret := os.Getenv("MOCKOIDC_CLIENT_SECRET")
 	if clientSecret == "" {
 		return errMockOidcClientSecretNotDefined
 	}
+
 	addrStr := os.Getenv("MOCKOIDC_ADDR")
 	if addrStr == "" {
 		return errMockOidcPortNotDefined
 	}
+
 	portStr := os.Getenv("MOCKOIDC_PORT")
 	if portStr == "" {
 		return errMockOidcPortNotDefined
 	}
+
 	accessTTLOverride := os.Getenv("MOCKOIDC_ACCESS_TTL")
 	if accessTTLOverride != "" {
 		newTTL, err := time.ParseDuration(accessTTLOverride)
 		if err != nil {
 			return err
 		}
+
 		accessTTL = newTTL
 	}

 	userStr := os.Getenv("MOCKOIDC_USERS")
 	if userStr == "" {
-		return errors.New("MOCKOIDC_USERS not defined")
+		return errMockOidcUsersNotDefined
 	}

 	var users []mockoidc.MockUser
+
 	err := json.Unmarshal([]byte(userStr), &users)
 	if err != nil {
 		return fmt.Errorf("unmarshalling users: %w", err)
 	}

-	log.Info().Interface("users", users).Msg("loading users from JSON")
+	log.Info().Interface(zf.Users, users).Msg("loading users from JSON")

-	log.Info().Msgf("Access token TTL: %s", accessTTL)
+	log.Info().Msgf("access token TTL: %s", accessTTL)

 	port, err := strconv.Atoi(portStr)
 	if err != nil {
@@ -97,7 +106,7 @@ func mockOIDC() error {
 		return err
 	}

-	listener, err := net.Listen("tcp", fmt.Sprintf("%s:%d", addrStr, port))
+	listener, err := new(net.ListenConfig).Listen(context.Background(), "tcp", fmt.Sprintf("%s:%d", addrStr, port))
 	if err != nil {
 		return err
 	}
@@ -106,8 +115,10 @@ func mockOIDC() error {
 	if err != nil {
 		return err
 	}
-	log.Info().Msgf("Mock OIDC server listening on %s", listener.Addr().String())
-	log.Info().Msgf("Issuer: %s", mock.Issuer())
+
+	log.Info().Msgf("mock OIDC server listening on %s", listener.Addr().String())
+	log.Info().Msgf("issuer: %s", mock.Issuer())
+
 	c := make(chan struct{})
 	<-c

@@ -138,12 +149,13 @@ func getMockOIDC(clientID string, clientSecret string, users []mockoidc.MockUser
 		ErrorQueue:                    &mockoidc.ErrorQueue{},
 	}

-	mock.AddMiddleware(func(h http.Handler) http.Handler {
+	_ = mock.AddMiddleware(func(h http.Handler) http.Handler {
 		return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
-			log.Info().Msgf("Request: %+v", r)
+			log.Info().Msgf("request: %+v", r)
 			h.ServeHTTP(w, r)
+
 			if r.Response != nil {
-				log.Info().Msgf("Response: %+v", r.Response)
+				log.Info().Msgf("response: %+v", r.Response)
 			}
 		})
 	})
--- a/cmd/headscale/cli/nodes.go
+++ b/cmd/headscale/cli/nodes.go
--- a/cmd/headscale/cli/policy.go
+++ b/cmd/headscale/cli/policy.go
@@ -1,33 +1,54 @@
 package cli

 import (
-	"context"
+	"errors"
 	"fmt"
-	"io"
 	"os"

 	v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
+	"github.com/juanfont/headscale/hscontrol/db"
 	"github.com/juanfont/headscale/hscontrol/policy"
 	"github.com/juanfont/headscale/hscontrol/types"
-	"github.com/rs/zerolog/log"
 	"github.com/spf13/cobra"
 	"tailscale.com/types/views"
 )

+const (
+	bypassFlag = "bypass-grpc-and-access-database-directly" //nolint:gosec // not a credential
+)
+
+var errAborted = errors.New("command aborted by user")
+
+// bypassDatabase loads the server config and opens the database directly,
+// bypassing the gRPC server. The caller is responsible for closing the
+// returned database handle.
+func bypassDatabase() (*db.HSDatabase, error) {
+	cfg, err := types.LoadServerConfig()
+	if err != nil {
+		return nil, fmt.Errorf("loading config: %w", err)
+	}
+
+	d, err := db.NewHeadscaleDatabase(cfg, nil)
+	if err != nil {
+		return nil, fmt.Errorf("opening database: %w", err)
+	}
+
+	return d, nil
+}
+
 func init() {
 	rootCmd.AddCommand(policyCmd)
+
+	getPolicy.Flags().BoolP(bypassFlag, "", false, "Uses the headscale config to directly access the database, bypassing gRPC and does not require the server to be running")
 	policyCmd.AddCommand(getPolicy)

 	setPolicy.Flags().StringP("file", "f", "", "Path to a policy file in HuJSON format")
-	if err := setPolicy.MarkFlagRequired("file"); err != nil {
-		log.Fatal().Err(err).Msg("")
-	}
+	setPolicy.Flags().BoolP(bypassFlag, "", false, "Uses the headscale config to directly access the database, bypassing gRPC and does not require the server to be running")
+	mustMarkRequired(setPolicy, "file")
 	policyCmd.AddCommand(setPolicy)

 	checkPolicy.Flags().StringP("file", "f", "", "Path to a policy file in HuJSON format")
-	if err := checkPolicy.MarkFlagRequired("file"); err != nil {
-		log.Fatal().Err(err).Msg("")
-	}
+	mustMarkRequired(checkPolicy, "file")
 	policyCmd.AddCommand(checkPolicy)
 }

@@ -40,27 +61,46 @@ var getPolicy = &cobra.Command{
 	Use:     "get",
 	Short:   "Print the current ACL Policy",
 	Aliases: []string{"show", "view", "fetch"},
-	Run: func(cmd *cobra.Command, args []string) {
-		output := GetOutputFlag(cmd)
-
-		err := WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
-			request := &v1.GetPolicyRequest{}
-
-			response, err := client.GetPolicy(ctx, request)
-			if err != nil {
-				ErrorOutput(err, fmt.Sprintf("Failed loading ACL Policy: %s", err), output)
-				return err
+	RunE: func(cmd *cobra.Command, args []string) error {
+		var policyData string
+		if bypass, _ := cmd.Flags().GetBool(bypassFlag); bypass {
+			if !confirmAction(cmd, "DO NOT run this command if an instance of headscale is running, are you sure headscale is not running?") {
+				return errAborted
 			}

-			// TODO(pallabpain): Maybe print this better?
-			// This does not pass output as we dont support yaml, json or json-line
-			// output for this command. It is HuJSON already.
-			SuccessOutput("", response.GetPolicy(), "")
-			return nil
-		})
-		if err != nil {
-			return
+			d, err := bypassDatabase()
+			if err != nil {
+				return err
+			}
+			defer d.Close()
+
+			pol, err := d.GetPolicy()
+			if err != nil {
+				return fmt.Errorf("loading policy from database: %w", err)
+			}
+
+			policyData = pol.Data
+		} else {
+			ctx, client, conn, cancel, err := newHeadscaleCLIWithConfig()
+			if err != nil {
+				return fmt.Errorf("connecting to headscale: %w", err)
+			}
+			defer cancel()
+			defer conn.Close()
+
+			response, err := client.GetPolicy(ctx, &v1.GetPolicyRequest{})
+			if err != nil {
+				return fmt.Errorf("loading ACL policy: %w", err)
+			}
+
+			policyData = response.GetPolicy()
 		}
+
+		// This does not pass output format as we don't support yaml, json or
+		// json-line output for this command. It is HuJSON already.
+		fmt.Println(policyData)
+
+		return nil
 	},
 }

@@ -71,66 +111,79 @@ var setPolicy = &cobra.Command{
 	Updates the existing ACL Policy with the provided policy. The policy must be a valid HuJSON object.
 	This command only works when the acl.policy_mode is set to "db", and the policy will be stored in the database.`,
 	Aliases: []string{"put", "update"},
-	Run: func(cmd *cobra.Command, args []string) {
-		output := GetOutputFlag(cmd)
+	RunE: func(cmd *cobra.Command, args []string) error {
 		policyPath, _ := cmd.Flags().GetString("file")

-		f, err := os.Open(policyPath)
+		policyBytes, err := os.ReadFile(policyPath)
 		if err != nil {
-			ErrorOutput(err, fmt.Sprintf("Error opening the policy file: %s", err), output)
-			return
-		}
-		defer f.Close()
-
-		policyBytes, err := io.ReadAll(f)
-		if err != nil {
-			ErrorOutput(err, fmt.Sprintf("Error reading the policy file: %s", err), output)
-			return
+			return fmt.Errorf("reading policy file: %w", err)
 		}

-		request := &v1.SetPolicyRequest{Policy: string(policyBytes)}
-
-		err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
-			if _, err := client.SetPolicy(ctx, request); err != nil {
-				ErrorOutput(err, fmt.Sprintf("Failed to set ACL Policy: %s", err), output)
-				return err
+		if bypass, _ := cmd.Flags().GetBool(bypassFlag); bypass {
+			if !confirmAction(cmd, "DO NOT run this command if an instance of headscale is running, are you sure headscale is not running?") {
+				return errAborted
 			}

-			SuccessOutput(nil, "Policy updated.", "")
-			return nil
-		})
-		if err != nil {
-			return
+			d, err := bypassDatabase()
+			if err != nil {
+				return err
+			}
+			defer d.Close()
+
+			users, err := d.ListUsers()
+			if err != nil {
+				return fmt.Errorf("loading users for policy validation: %w", err)
+			}
+
+			_, err = policy.NewPolicyManager(policyBytes, users, views.Slice[types.NodeView]{})
+			if err != nil {
+				return fmt.Errorf("parsing policy file: %w", err)
+			}
+
+			_, err = d.SetPolicy(string(policyBytes))
+			if err != nil {
+				return fmt.Errorf("setting ACL policy: %w", err)
+			}
+		} else {
+			request := &v1.SetPolicyRequest{Policy: string(policyBytes)}
+
+			ctx, client, conn, cancel, err := newHeadscaleCLIWithConfig()
+			if err != nil {
+				return fmt.Errorf("connecting to headscale: %w", err)
+			}
+			defer cancel()
+			defer conn.Close()
+
+			_, err = client.SetPolicy(ctx, request)
+			if err != nil {
+				return fmt.Errorf("setting ACL policy: %w", err)
+			}
 		}
+
+		fmt.Println("Policy updated.")
+
+		return nil
 	},
 }

 var checkPolicy = &cobra.Command{
 	Use:   "check",
 	Short: "Check the Policy file for errors",
-	Run: func(cmd *cobra.Command, args []string) {
-		output := GetOutputFlag(cmd)
+	RunE: func(cmd *cobra.Command, args []string) error {
 		policyPath, _ := cmd.Flags().GetString("file")

-		f, err := os.Open(policyPath)
+		policyBytes, err := os.ReadFile(policyPath)
 		if err != nil {
-			ErrorOutput(err, fmt.Sprintf("Error opening the policy file: %s", err), output)
-			return
-		}
-		defer f.Close()
-
-		policyBytes, err := io.ReadAll(f)
-		if err != nil {
-			ErrorOutput(err, fmt.Sprintf("Error reading the policy file: %s", err), output)
-			return
+			return fmt.Errorf("reading policy file: %w", err)
 		}

 		_, err = policy.NewPolicyManager(policyBytes, nil, views.Slice[types.NodeView]{})
 		if err != nil {
-			ErrorOutput(err, fmt.Sprintf("Error parsing the policy file: %s", err), output)
-			return
+			return fmt.Errorf("parsing policy file: %w", err)
 		}

-		SuccessOutput(nil, "Policy is valid", "")
+		fmt.Println("Policy is valid")
+
+		return nil
 	},
 }
--- a/cmd/headscale/cli/preauthkeys.go
+++ b/cmd/headscale/cli/preauthkeys.go
@@ -5,27 +5,23 @@ import (
 	"fmt"
 	"strconv"
 	"strings"
-	"time"

 	v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
-	"github.com/prometheus/common/model"
+	"github.com/juanfont/headscale/hscontrol/util"
 	"github.com/pterm/pterm"
-	"github.com/rs/zerolog/log"
 	"github.com/spf13/cobra"
-	"google.golang.org/protobuf/types/known/timestamppb"
+)
+
+const (
+	DefaultPreAuthKeyExpiry = "1h"
 )

 func init() {
 	rootCmd.AddCommand(preauthkeysCmd)
-	preauthkeysCmd.PersistentFlags().Uint64P("user", "u", 0, "User identifier (ID)")
-
-	err := preauthkeysCmd.MarkPersistentFlagRequired("user")
-	if err != nil {
-		log.Fatal().Err(err).Msg("")
-	}
 	preauthkeysCmd.AddCommand(listPreAuthKeys)
 	preauthkeysCmd.AddCommand(createPreAuthKeyCmd)
 	preauthkeysCmd.AddCommand(expirePreAuthKeyCmd)
+	preauthkeysCmd.AddCommand(deletePreAuthKeyCmd)
 	createPreAuthKeyCmd.PersistentFlags().
 		Bool("reusable", false, "Make the preauthkey reusable")
 	createPreAuthKeyCmd.PersistentFlags().
@@ -34,6 +30,9 @@ func init() {
 		StringP("expiration", "e", DefaultPreAuthKeyExpiry, "Human-readable expiration of the key (e.g. 30m, 24h)")
 	createPreAuthKeyCmd.Flags().
 		StringSlice("tags", []string{}, "Tags to automatically assign to node")
+	createPreAuthKeyCmd.PersistentFlags().Uint64P("user", "u", 0, "User identifier (ID)")
+	expirePreAuthKeyCmd.PersistentFlags().Uint64P("id", "i", 0, "Authkey ID")
+	deletePreAuthKeyCmd.PersistentFlags().Uint64P("id", "i", 0, "Authkey ID")
 }

 var preauthkeysCmd = &cobra.Command{
@@ -44,196 +43,136 @@ var preauthkeysCmd = &cobra.Command{

 var listPreAuthKeys = &cobra.Command{
 	Use:     "list",
-	Short:   "List the preauthkeys for this user",
+	Short:   "List all preauthkeys",
 	Aliases: []string{"ls", "show"},
-	Run: func(cmd *cobra.Command, args []string) {
-		output := GetOutputFlag(cmd)
-
-		user, err := cmd.Flags().GetUint64("user")
+	RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
+		response, err := client.ListPreAuthKeys(ctx, &v1.ListPreAuthKeysRequest{})
 		if err != nil {
-			ErrorOutput(err, fmt.Sprintf("Error getting user: %s", err), output)
-			return
+			return fmt.Errorf("listing preauthkeys: %w", err)
 		}

-		err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
-			request := &v1.ListPreAuthKeysRequest{
-				User: user,
-			}
-
-			response, err := client.ListPreAuthKeys(ctx, request)
-			if err != nil {
-				ErrorOutput(
-					err,
-					fmt.Sprintf("Error getting the list of keys: %s", err),
-					output,
-				)
-				return err
-			}
-
-			if output != "" {
-				SuccessOutput(response.GetPreAuthKeys(), "", output)
-				return nil
-			}
-
+		return printListOutput(cmd, response.GetPreAuthKeys(), func() error {
 			tableData := pterm.TableData{
 				{
 					"ID",
-					"Key",
+					"Key/Prefix",
 					"Reusable",
 					"Ephemeral",
 					"Used",
 					"Expiration",
 					"Created",
-					"Tags",
+					"Owner",
 				},
 			}
+
 			for _, key := range response.GetPreAuthKeys() {
 				expiration := "-"
 				if key.GetExpiration() != nil {
 					expiration = ColourTime(key.GetExpiration().AsTime())
 				}

-				aclTags := ""
-
-				for _, tag := range key.GetAclTags() {
-					aclTags += "," + tag
+				var owner string
+				if len(key.GetAclTags()) > 0 {
+					owner = strings.Join(key.GetAclTags(), "\n")
+				} else if key.GetUser() != nil {
+					owner = key.GetUser().GetName()
+				} else {
+					owner = "-"
 				}

-				aclTags = strings.TrimLeft(aclTags, ",")
-
 				tableData = append(tableData, []string{
-					strconv.FormatUint(key.GetId(), 10),
+					strconv.FormatUint(key.GetId(), util.Base10),
 					key.GetKey(),
 					strconv.FormatBool(key.GetReusable()),
 					strconv.FormatBool(key.GetEphemeral()),
 					strconv.FormatBool(key.GetUsed()),
 					expiration,
 					key.GetCreatedAt().AsTime().Format(HeadscaleDateTimeFormat),
-					aclTags,
+					owner,
 				})
+			}

-			}
-			err = pterm.DefaultTable.WithHasHeader().WithData(tableData).Render()
-			if err != nil {
-				ErrorOutput(
-					err,
-					fmt.Sprintf("Failed to render pterm table: %s", err),
-					output,
-				)
-				return err
-			}
-			return nil
+			return pterm.DefaultTable.WithHasHeader().WithData(tableData).Render()
 		})
-		if err != nil {
-			return
-		}
-	},
+	}),
 }

 var createPreAuthKeyCmd = &cobra.Command{
 	Use:     "create",
-	Short:   "Creates a new preauthkey in the specified user",
+	Short:   "Creates a new preauthkey",
 	Aliases: []string{"c", "new"},
-	Run: func(cmd *cobra.Command, args []string) {
-		output := GetOutputFlag(cmd)
-
-		user, err := cmd.Flags().GetUint64("user")
-		if err != nil {
-			ErrorOutput(err, fmt.Sprintf("Error getting user: %s", err), output)
-			return
-		}
-
+	RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
+		user, _ := cmd.Flags().GetUint64("user")
 		reusable, _ := cmd.Flags().GetBool("reusable")
 		ephemeral, _ := cmd.Flags().GetBool("ephemeral")
 		tags, _ := cmd.Flags().GetStringSlice("tags")

+		expiration, err := expirationFromFlag(cmd)
+		if err != nil {
+			return err
+		}
+
 		request := &v1.CreatePreAuthKeyRequest{
-			User:      user,
-			Reusable:  reusable,
-			Ephemeral: ephemeral,
-			AclTags:   tags,
+			User:       user,
+			Reusable:   reusable,
+			Ephemeral:  ephemeral,
+			AclTags:    tags,
+			Expiration: expiration,
 		}

-		durationStr, _ := cmd.Flags().GetString("expiration")
-
-		duration, err := model.ParseDuration(durationStr)
+		response, err := client.CreatePreAuthKey(ctx, request)
 		if err != nil {
-			ErrorOutput(
-				err,
-				fmt.Sprintf("Could not parse duration: %s\n", err),
-				output,
-			)
-			return
+			return fmt.Errorf("creating preauthkey: %w", err)
 		}

-		expiration := time.Now().UTC().Add(time.Duration(duration))
-
-		log.Trace().
-			Dur("expiration", time.Duration(duration)).
-			Msg("expiration has been set")
-
-		request.Expiration = timestamppb.New(expiration)
-
-		err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
-			response, err := client.CreatePreAuthKey(ctx, request)
-			if err != nil {
-				ErrorOutput(
-					err,
-					fmt.Sprintf("Cannot create Pre Auth Key: %s\n", err),
-					output,
-				)
-				return err
-			}
-
-			SuccessOutput(response.GetPreAuthKey(), response.GetPreAuthKey().GetKey(), output)
-			return nil
-		})
-		if err != nil {
-			return
-		}
-	},
+		return printOutput(cmd, response.GetPreAuthKey(), response.GetPreAuthKey().GetKey())
+	}),
 }

 var expirePreAuthKeyCmd = &cobra.Command{
-	Use:     "expire KEY",
+	Use:     "expire",
 	Short:   "Expire a preauthkey",
 	Aliases: []string{"revoke", "exp", "e"},
-	Args: func(cmd *cobra.Command, args []string) error {
-		if len(args) < 1 {
-			return errMissingParameter
+	RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
+		id, _ := cmd.Flags().GetUint64("id")
+
+		if id == 0 {
+			return fmt.Errorf("missing --id parameter: %w", errMissingParameter)
 		}

-		return nil
-	},
-	Run: func(cmd *cobra.Command, args []string) {
-		output := GetOutputFlag(cmd)
-		user, err := cmd.Flags().GetUint64("user")
+		request := &v1.ExpirePreAuthKeyRequest{
+			Id: id,
+		}
+
+		response, err := client.ExpirePreAuthKey(ctx, request)
 		if err != nil {
-			ErrorOutput(err, fmt.Sprintf("Error getting user: %s", err), output)
-			return
+			return fmt.Errorf("expiring preauthkey: %w", err)
 		}

-		err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
-			request := &v1.ExpirePreAuthKeyRequest{
-				User: user,
-				Key:  args[0],
-			}
-
-			response, err := client.ExpirePreAuthKey(ctx, request)
-			if err != nil {
-				ErrorOutput(
-					err,
-					fmt.Sprintf("Cannot expire Pre Auth Key: %s\n", err),
-					output,
-				)
-				return err
-			}
-
-			SuccessOutput(response, "Key expired", output)
-			return nil
-		})
-		if err != nil {
-			return
-		}
-	},
+		return printOutput(cmd, response, "Key expired")
+	}),
+}
+
+var deletePreAuthKeyCmd = &cobra.Command{
+	Use:     "delete",
+	Short:   "Delete a preauthkey",
+	Aliases: []string{"del", "rm", "d"},
+	RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
+		id, _ := cmd.Flags().GetUint64("id")
+
+		if id == 0 {
+			return fmt.Errorf("missing --id parameter: %w", errMissingParameter)
+		}
+
+		request := &v1.DeletePreAuthKeyRequest{
+			Id: id,
+		}
+
+		response, err := client.DeletePreAuthKey(ctx, request)
+		if err != nil {
+			return fmt.Errorf("deleting preauthkey: %w", err)
+		}
+
+		return printOutput(cmd, response, "Key deleted")
+	}),
 }
--- a/cmd/headscale/cli/root.go
+++ b/cmd/headscale/cli/root.go
@@ -1,10 +1,10 @@
 package cli

 import (
-	"fmt"
 	"os"
 	"runtime"
 	"slices"
+	"strings"

 	"github.com/juanfont/headscale/hscontrol/types"
 	"github.com/rs/zerolog"
@@ -34,25 +34,34 @@ func init() {
 		StringP("output", "o", "", "Output format. Empty for human-readable, 'json', 'json-line' or 'yaml'")
 	rootCmd.PersistentFlags().
 		Bool("force", false, "Disable prompts and forces the execution")
+
+	// Re-enable usage output only for flag-parsing errors; runtime errors
+	// from RunE should never dump usage text.
+	rootCmd.SetFlagErrorFunc(func(cmd *cobra.Command, err error) error {
+		cmd.SilenceUsage = false
+
+		return err
+	})
 }

 func initConfig() {
 	if cfgFile == "" {
 		cfgFile = os.Getenv("HEADSCALE_CONFIG")
 	}
+
 	if cfgFile != "" {
 		err := types.LoadConfig(cfgFile, true)
 		if err != nil {
-			log.Fatal().Caller().Err(err).Msgf("Error loading config file %s", cfgFile)
+			log.Fatal().Caller().Err(err).Msgf("error loading config file %s", cfgFile)
 		}
 	} else {
 		err := types.LoadConfig("", false)
 		if err != nil {
-			log.Fatal().Caller().Err(err).Msgf("Error loading config")
+			log.Fatal().Caller().Err(err).Msgf("error loading config")
 		}
 	}

-	machineOutput := HasMachineOutputFlag()
+	machineOutput := hasMachineOutputFlag()

 	// If the user has requested a "node" readable format,
 	// then disable login so the output remains valid.
@@ -67,25 +76,66 @@ func initConfig() {

 	disableUpdateCheck := viper.GetBool("disable_check_updates")
 	if !disableUpdateCheck && !machineOutput {
+		versionInfo := types.GetVersionInfo()
 		if (runtime.GOOS == "linux" || runtime.GOOS == "darwin") &&
-			types.Version != "dev" {
+			!versionInfo.Dirty {
 			githubTag := &latest.GithubTag{
-				Owner:      "juanfont",
-				Repository: "headscale",
+				Owner:         "juanfont",
+				Repository:    "headscale",
+				TagFilterFunc: filterPreReleasesIfStable(func() string { return versionInfo.Version }),
 			}
-			res, err := latest.Check(githubTag, types.Version)
+
+			res, err := latest.Check(githubTag, versionInfo.Version)
 			if err == nil && res.Outdated {
 				//nolint
 				log.Warn().Msgf(
 					"An updated version of Headscale has been found (%s vs. your current %s). Check it out https://github.com/juanfont/headscale/releases\n",
 					res.Current,
-					types.Version,
+					versionInfo.Version,
 				)
 			}
 		}
 	}
 }

+var prereleases = []string{"alpha", "beta", "rc", "dev"}
+
+func isPreReleaseVersion(version string) bool {
+	for _, unstable := range prereleases {
+		if strings.Contains(version, unstable) {
+			return true
+		}
+	}
+
+	return false
+}
+
+// filterPreReleasesIfStable returns a function that filters out
+// pre-release tags if the current version is stable.
+// If the current version is a pre-release, it does not filter anything.
+// versionFunc is a function that returns the current version string, it is
+// a func for testability.
+func filterPreReleasesIfStable(versionFunc func() string) func(string) bool {
+	return func(tag string) bool {
+		version := versionFunc()
+
+		// If we are on a pre-release version, then we do not filter anything
+		// as we want to recommend the user the latest pre-release.
+		if isPreReleaseVersion(version) {
+			return false
+		}
+
+		// If we are on a stable release, filter out pre-releases.
+		for _, ignore := range prereleases {
+			if strings.Contains(tag, ignore) {
+				return true
+			}
+		}
+
+		return false
+	}
+}
+
 var rootCmd = &cobra.Command{
 	Use:   "headscale",
 	Short: "headscale - a Tailscale control server",
@@ -93,11 +143,15 @@ var rootCmd = &cobra.Command{
 headscale is an open source implementation of the Tailscale control server

 https://github.com/juanfont/headscale`,
+	SilenceErrors: true,
+	SilenceUsage:  true,
 }

 func Execute() {
-	if err := rootCmd.Execute(); err != nil {
-		fmt.Fprintln(os.Stderr, err)
+	cmd, err := rootCmd.ExecuteC()
+	if err != nil {
+		outputFormat, _ := cmd.Flags().GetString("output")
+		printError(err, outputFormat)
 		os.Exit(1)
 	}
 }
--- a/cmd/headscale/cli/root_test.go
+++ b/cmd/headscale/cli/root_test.go
@@ -0,0 +1,293 @@
+package cli
+
+import (
+	"testing"
+)
+
+func TestFilterPreReleasesIfStable(t *testing.T) {
+	tests := []struct {
+		name           string
+		currentVersion string
+		tag            string
+		expectedFilter bool
+		description    string
+	}{
+		{
+			name:           "stable version filters alpha tag",
+			currentVersion: "0.23.0",
+			tag:            "v0.24.0-alpha.1",
+			expectedFilter: true,
+			description:    "When on stable release, alpha tags should be filtered",
+		},
+		{
+			name:           "stable version filters beta tag",
+			currentVersion: "0.23.0",
+			tag:            "v0.24.0-beta.2",
+			expectedFilter: true,
+			description:    "When on stable release, beta tags should be filtered",
+		},
+		{
+			name:           "stable version filters rc tag",
+			currentVersion: "0.23.0",
+			tag:            "v0.24.0-rc.1",
+			expectedFilter: true,
+			description:    "When on stable release, rc tags should be filtered",
+		},
+		{
+			name:           "stable version allows stable tag",
+			currentVersion: "0.23.0",
+			tag:            "v0.24.0",
+			expectedFilter: false,
+			description:    "When on stable release, stable tags should not be filtered",
+		},
+		{
+			name:           "alpha version allows alpha tag",
+			currentVersion: "0.23.0-alpha.1",
+			tag:            "v0.24.0-alpha.2",
+			expectedFilter: false,
+			description:    "When on alpha release, alpha tags should not be filtered",
+		},
+		{
+			name:           "alpha version allows beta tag",
+			currentVersion: "0.23.0-alpha.1",
+			tag:            "v0.24.0-beta.1",
+			expectedFilter: false,
+			description:    "When on alpha release, beta tags should not be filtered",
+		},
+		{
+			name:           "alpha version allows rc tag",
+			currentVersion: "0.23.0-alpha.1",
+			tag:            "v0.24.0-rc.1",
+			expectedFilter: false,
+			description:    "When on alpha release, rc tags should not be filtered",
+		},
+		{
+			name:           "alpha version allows stable tag",
+			currentVersion: "0.23.0-alpha.1",
+			tag:            "v0.24.0",
+			expectedFilter: false,
+			description:    "When on alpha release, stable tags should not be filtered",
+		},
+		{
+			name:           "beta version allows alpha tag",
+			currentVersion: "0.23.0-beta.1",
+			tag:            "v0.24.0-alpha.1",
+			expectedFilter: false,
+			description:    "When on beta release, alpha tags should not be filtered",
+		},
+		{
+			name:           "beta version allows beta tag",
+			currentVersion: "0.23.0-beta.2",
+			tag:            "v0.24.0-beta.3",
+			expectedFilter: false,
+			description:    "When on beta release, beta tags should not be filtered",
+		},
+		{
+			name:           "beta version allows rc tag",
+			currentVersion: "0.23.0-beta.1",
+			tag:            "v0.24.0-rc.1",
+			expectedFilter: false,
+			description:    "When on beta release, rc tags should not be filtered",
+		},
+		{
+			name:           "beta version allows stable tag",
+			currentVersion: "0.23.0-beta.1",
+			tag:            "v0.24.0",
+			expectedFilter: false,
+			description:    "When on beta release, stable tags should not be filtered",
+		},
+		{
+			name:           "rc version allows alpha tag",
+			currentVersion: "0.23.0-rc.1",
+			tag:            "v0.24.0-alpha.1",
+			expectedFilter: false,
+			description:    "When on rc release, alpha tags should not be filtered",
+		},
+		{
+			name:           "rc version allows beta tag",
+			currentVersion: "0.23.0-rc.1",
+			tag:            "v0.24.0-beta.1",
+			expectedFilter: false,
+			description:    "When on rc release, beta tags should not be filtered",
+		},
+		{
+			name:           "rc version allows rc tag",
+			currentVersion: "0.23.0-rc.2",
+			tag:            "v0.24.0-rc.3",
+			expectedFilter: false,
+			description:    "When on rc release, rc tags should not be filtered",
+		},
+		{
+			name:           "rc version allows stable tag",
+			currentVersion: "0.23.0-rc.1",
+			tag:            "v0.24.0",
+			expectedFilter: false,
+			description:    "When on rc release, stable tags should not be filtered",
+		},
+		{
+			name:           "stable version with patch filters alpha",
+			currentVersion: "0.23.1",
+			tag:            "v0.24.0-alpha.1",
+			expectedFilter: true,
+			description:    "Stable version with patch number should filter alpha tags",
+		},
+		{
+			name:           "stable version with patch allows stable",
+			currentVersion: "0.23.1",
+			tag:            "v0.24.0",
+			expectedFilter: false,
+			description:    "Stable version with patch number should allow stable tags",
+		},
+		{
+			name:           "tag with alpha substring in version number",
+			currentVersion: "0.23.0",
+			tag:            "v1.0.0-alpha.1",
+			expectedFilter: true,
+			description:    "Tags with alpha in version string should be filtered on stable",
+		},
+		{
+			name:           "tag with beta substring in version number",
+			currentVersion: "0.23.0",
+			tag:            "v1.0.0-beta.1",
+			expectedFilter: true,
+			description:    "Tags with beta in version string should be filtered on stable",
+		},
+		{
+			name:           "tag with rc substring in version number",
+			currentVersion: "0.23.0",
+			tag:            "v1.0.0-rc.1",
+			expectedFilter: true,
+			description:    "Tags with rc in version string should be filtered on stable",
+		},
+		{
+			name:           "empty tag on stable version",
+			currentVersion: "0.23.0",
+			tag:            "",
+			expectedFilter: false,
+			description:    "Empty tags should not be filtered",
+		},
+		{
+			name:           "dev version allows all tags",
+			currentVersion: "0.23.0-dev",
+			tag:            "v0.24.0-alpha.1",
+			expectedFilter: false,
+			description:    "Dev versions should not filter any tags (pre-release allows all)",
+		},
+		{
+			name:           "stable version filters dev tag",
+			currentVersion: "0.23.0",
+			tag:            "v0.24.0-dev",
+			expectedFilter: true,
+			description:    "When on stable release, dev tags should be filtered",
+		},
+		{
+			name:           "dev version allows dev tag",
+			currentVersion: "0.23.0-dev",
+			tag:            "v0.24.0-dev.1",
+			expectedFilter: false,
+			description:    "When on dev release, dev tags should not be filtered",
+		},
+		{
+			name:           "dev version allows stable tag",
+			currentVersion: "0.23.0-dev",
+			tag:            "v0.24.0",
+			expectedFilter: false,
+			description:    "When on dev release, stable tags should not be filtered",
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			result := filterPreReleasesIfStable(func() string { return tt.currentVersion })(tt.tag)
+			if result != tt.expectedFilter {
+				t.Errorf("%s: got %v, want %v\nDescription: %s\nCurrent version: %s, Tag: %s",
+					tt.name,
+					result,
+					tt.expectedFilter,
+					tt.description,
+					tt.currentVersion,
+					tt.tag,
+				)
+			}
+		})
+	}
+}
+
+func TestIsPreReleaseVersion(t *testing.T) {
+	tests := []struct {
+		name        string
+		version     string
+		expected    bool
+		description string
+	}{
+		{
+			name:        "stable version",
+			version:     "0.23.0",
+			expected:    false,
+			description: "Stable version should not be pre-release",
+		},
+		{
+			name:        "alpha version",
+			version:     "0.23.0-alpha.1",
+			expected:    true,
+			description: "Alpha version should be pre-release",
+		},
+		{
+			name:        "beta version",
+			version:     "0.23.0-beta.1",
+			expected:    true,
+			description: "Beta version should be pre-release",
+		},
+		{
+			name:        "rc version",
+			version:     "0.23.0-rc.1",
+			expected:    true,
+			description: "RC version should be pre-release",
+		},
+		{
+			name:        "version with alpha substring",
+			version:     "0.23.0-alphabetical",
+			expected:    true,
+			description: "Version containing 'alpha' should be pre-release",
+		},
+		{
+			name:        "version with beta substring",
+			version:     "0.23.0-betamax",
+			expected:    true,
+			description: "Version containing 'beta' should be pre-release",
+		},
+		{
+			name:        "dev version",
+			version:     "0.23.0-dev",
+			expected:    true,
+			description: "Dev version should be pre-release",
+		},
+		{
+			name:        "empty version",
+			version:     "",
+			expected:    false,
+			description: "Empty version should not be pre-release",
+		},
+		{
+			name:        "version with patch number",
+			version:     "0.23.1",
+			expected:    false,
+			description: "Stable version with patch should not be pre-release",
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			result := isPreReleaseVersion(tt.version)
+			if result != tt.expected {
+				t.Errorf("%s: got %v, want %v\nDescription: %s\nVersion: %s",
+					tt.name,
+					result,
+					tt.expected,
+					tt.description,
+					tt.version,
+				)
+			}
+		})
+	}
+}
--- a/cmd/headscale/cli/serve.go
+++ b/cmd/headscale/cli/serve.go
@@ -5,7 +5,6 @@ import (
 	"fmt"
 	"net/http"

-	"github.com/rs/zerolog/log"
 	"github.com/spf13/cobra"
 	"github.com/tailscale/squibble"
 )
@@ -17,24 +16,22 @@ func init() {
 var serveCmd = &cobra.Command{
 	Use:   "serve",
 	Short: "Launches the headscale server",
-	Args: func(cmd *cobra.Command, args []string) error {
-		return nil
-	},
-	Run: func(cmd *cobra.Command, args []string) {
+	RunE: func(cmd *cobra.Command, args []string) error {
 		app, err := newHeadscaleServerWithConfig()
 		if err != nil {
-			var squibbleErr squibble.ValidationError
-			if errors.As(err, &squibbleErr) {
+			if squibbleErr, ok := errors.AsType[squibble.ValidationError](err); ok {
 				fmt.Printf("SQLite schema failed to validate:\n")
 				fmt.Println(squibbleErr.Diff)
 			}

-			log.Fatal().Caller().Err(err).Msg("Error initializing")
+			return fmt.Errorf("initializing: %w", err)
 		}

 		err = app.Serve()
 		if err != nil && !errors.Is(err, http.ErrServerClosed) {
-			log.Fatal().Caller().Err(err).Msg("Headscale ran into an error and had to shut down.")
+			return fmt.Errorf("headscale ran into an error and had to shut down: %w", err)
 		}
+
+		return nil
 	},
 }
--- a/cmd/headscale/cli/serve_test.go
+++ b/cmd/headscale/cli/serve_test.go
@@ -1,70 +0,0 @@
-package cli
-
-import (
-	"testing"
-
-	"github.com/stretchr/testify/assert"
-	"github.com/stretchr/testify/require"
-)
-
-func TestServeCommand(t *testing.T) {
-	// Test that the serve command exists and is properly configured
-	assert.NotNil(t, serveCmd)
-	assert.Equal(t, "serve", serveCmd.Use)
-	assert.Equal(t, "Launches the headscale server", serveCmd.Short)
-	assert.NotNil(t, serveCmd.Run)
-	assert.NotNil(t, serveCmd.Args)
-}
-
-func TestServeCommandInRootCommand(t *testing.T) {
-	// Test that serve is available as a subcommand of root
-	cmd, _, err := rootCmd.Find([]string{"serve"})
-	require.NoError(t, err)
-	assert.Equal(t, "serve", cmd.Name())
-	assert.Equal(t, serveCmd, cmd)
-}
-
-func TestServeCommandArgs(t *testing.T) {
-	// Test that the Args function is defined and accepts any arguments
-	// The current implementation always returns nil (accepts any args)
-	assert.NotNil(t, serveCmd.Args)
-
-	// Test the args function directly
-	err := serveCmd.Args(serveCmd, []string{})
-	assert.NoError(t, err, "Args function should accept empty arguments")
-
-	err = serveCmd.Args(serveCmd, []string{"extra", "args"})
-	assert.NoError(t, err, "Args function should accept extra arguments")
-}
-
-func TestServeCommandHelp(t *testing.T) {
-	// Test that the command has proper help text
-	assert.NotEmpty(t, serveCmd.Short)
-	assert.Contains(t, serveCmd.Short, "server")
-	assert.Contains(t, serveCmd.Short, "headscale")
-}
-
-func TestServeCommandStructure(t *testing.T) {
-	// Test basic command structure
-	assert.Equal(t, "serve", serveCmd.Name())
-	assert.Equal(t, "Launches the headscale server", serveCmd.Short)
-
-	// Test that it has no subcommands (it's a leaf command)
-	subcommands := serveCmd.Commands()
-	assert.Empty(t, subcommands, "Serve command should not have subcommands")
-}
-
-// Note: We can't easily test the actual execution of serve because:
-// 1. It depends on configuration files being present and valid
-// 2. It calls log.Fatal() which would exit the test process
-// 3. It tries to start an actual HTTP server which would block forever
-// 4. It requires database connections and other infrastructure
-//
-// In a real refactor, we would:
-// 1. Extract server initialization logic to a testable function
-// 2. Use dependency injection for configuration and dependencies
-// 3. Return errors instead of calling log.Fatal()
-// 4. Add graceful shutdown capabilities for testing
-// 5. Allow server startup to be cancelled via context
-//
-// For now, we test the command structure and basic properties.
--- a/cmd/headscale/cli/table_filter.go
+++ b/cmd/headscale/cli/table_filter.go
@@ -1,55 +0,0 @@
-package cli
-
-import (
-	"strings"
-
-	"github.com/pterm/pterm"
-	"github.com/spf13/cobra"
-)
-
-const (
-	HeadscaleDateTimeFormat = "2006-01-02 15:04:05"
-	DefaultAPIKeyExpiry     = "90d"
-	DefaultPreAuthKeyExpiry = "1h"
-)
-
-// FilterTableColumns filters table columns based on --columns flag
-func FilterTableColumns(cmd *cobra.Command, tableData pterm.TableData) pterm.TableData {
-	columns, _ := cmd.Flags().GetString("columns")
-	if columns == "" || len(tableData) == 0 {
-		return tableData
-	}
-
-	headers := tableData[0]
-	wantedColumns := strings.Split(columns, ",")
-
-	// Find column indices
-	var indices []int
-	for _, wanted := range wantedColumns {
-		wanted = strings.TrimSpace(wanted)
-		for i, header := range headers {
-			if strings.EqualFold(header, wanted) {
-				indices = append(indices, i)
-				break
-			}
-		}
-	}
-
-	if len(indices) == 0 {
-		return tableData
-	}
-
-	// Filter all rows
-	filtered := make(pterm.TableData, len(tableData))
-	for i, row := range tableData {
-		newRow := make([]string, len(indices))
-		for j, idx := range indices {
-			if idx < len(row) {
-				newRow[j] = row[idx]
-			}
-		}
-		filtered[i] = newRow
-	}
-
-	return filtered
-}
--- a/cmd/headscale/cli/users.go
+++ b/cmd/headscale/cli/users.go
@@ -6,34 +6,42 @@ import (
 	"fmt"
 	"net/url"
 	"strconv"
-	"strings"

-	survey "github.com/AlecAivazis/survey/v2"
 	v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
+	"github.com/juanfont/headscale/hscontrol/util"
+	"github.com/juanfont/headscale/hscontrol/util/zlog/zf"
 	"github.com/pterm/pterm"
 	"github.com/rs/zerolog/log"
 	"github.com/spf13/cobra"
-	"google.golang.org/grpc/status"
+)
+
+// CLI user errors.
+var (
+	errFlagRequired       = errors.New("--name or --identifier flag is required")
+	errMultipleUsersMatch = errors.New("multiple users match query, specify an ID")
 )

 func usernameAndIDFlag(cmd *cobra.Command) {
-	cmd.Flags().StringP("user", "u", "", "User identifier (ID, name, or email)")
+	cmd.Flags().Int64P("identifier", "i", -1, "User identifier (ID)")
 	cmd.Flags().StringP("name", "n", "", "Username")
 }

-// userIDFromFlag returns the user ID using smart lookup.
-// If no user is specified, it will exit the program with an error.
-func userIDFromFlag(cmd *cobra.Command) uint64 {
-	userID, err := GetUserIdentifier(cmd)
-	if err != nil {
-		ErrorOutput(
-			err,
-			"Cannot identify user: "+err.Error(),
-			GetOutputFlag(cmd),
-		)
+// usernameAndIDFromFlag returns the username and ID from the flags of the command.
+func usernameAndIDFromFlag(cmd *cobra.Command) (uint64, string, error) {
+	username, _ := cmd.Flags().GetString("name")
+
+	identifier, _ := cmd.Flags().GetInt64("identifier")
+	if username == "" && identifier < 0 {
+		return 0, "", errFlagRequired
 	}

-	return userID
+	// Normalise unset/negative identifiers to 0 so the uint64
+	// conversion does not produce a bogus large value.
+	if identifier < 0 {
+		identifier = 0
+	}
+
+	return uint64(identifier), username, nil //nolint:gosec // identifier is clamped to >= 0 above
 }

 func init() {
@@ -43,26 +51,20 @@ func init() {
 	createUserCmd.Flags().StringP("email", "e", "", "Email")
 	createUserCmd.Flags().StringP("picture-url", "p", "", "Profile picture URL")
 	userCmd.AddCommand(listUsersCmd)
-	// Smart lookup filters - can be used individually or combined
-	listUsersCmd.Flags().StringP("user", "u", "", "Filter by user (ID, name, or email)")
-	listUsersCmd.Flags().Uint64P("id", "", 0, "Filter by user ID")
-	listUsersCmd.Flags().StringP("name", "n", "", "Filter by username")
-	listUsersCmd.Flags().StringP("email", "e", "", "Filter by email address")
-	listUsersCmd.Flags().String("columns", "", "Comma-separated list of columns to display (ID,Name,Username,Email,Created)")
+	usernameAndIDFlag(listUsersCmd)
+	listUsersCmd.Flags().StringP("email", "e", "", "Email")
 	userCmd.AddCommand(destroyUserCmd)
 	usernameAndIDFlag(destroyUserCmd)
 	userCmd.AddCommand(renameUserCmd)
 	usernameAndIDFlag(renameUserCmd)
 	renameUserCmd.Flags().StringP("new-name", "r", "", "New username")
-	renameUserCmd.MarkFlagRequired("new-name")
+	mustMarkRequired(renameUserCmd, "new-name")
 }

-var errMissingParameter = errors.New("missing parameters")
-
 var userCmd = &cobra.Command{
 	Use:     "users",
 	Short:   "Manage the users of Headscale",
-	Aliases: []string{"user", "namespace", "namespaces", "ns"},
+	Aliases: []string{"user"},
 }

 var createUserCmd = &cobra.Command{
@@ -76,10 +78,11 @@ var createUserCmd = &cobra.Command{

 		return nil
 	},
-	Run: func(cmd *cobra.Command, args []string) {
-		output := GetOutputFlag(cmd)
+	RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
 		userName := args[0]

+		log.Trace().Interface(zf.Client, client).Msg("obtained gRPC client")
+
 		request := &v1.CreateUserRequest{Name: userName}

 		if displayName, _ := cmd.Flags().GetString("display-name"); displayName != "" {
@@ -91,177 +94,101 @@ var createUserCmd = &cobra.Command{
 		}

 		if pictureURL, _ := cmd.Flags().GetString("picture-url"); pictureURL != "" {
-			if _, err := url.Parse(pictureURL); err != nil {
-				ErrorOutput(
-					err,
-					fmt.Sprintf(
-						"Invalid Picture URL: %s",
-						err,
-					),
-					output,
-				)
-				return
+			if _, err := url.Parse(pictureURL); err != nil { //nolint:noinlineerr
+				return fmt.Errorf("invalid picture URL: %w", err)
 			}
+
 			request.PictureUrl = pictureURL
 		}

-		err := WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
-			log.Trace().Interface("client", client).Msg("Obtained gRPC client")
-			log.Trace().Interface("request", request).Msg("Sending CreateUser request")
+		log.Trace().Interface(zf.Request, request).Msg("sending CreateUser request")

-			response, err := client.CreateUser(ctx, request)
-			if err != nil {
-				ErrorOutput(
-					err,
-					"Cannot create user: "+status.Convert(err).Message(),
-					output,
-				)
-				return err
-			}
-
-			SuccessOutput(response.GetUser(), "User created", output)
-			return nil
-		})
+		response, err := client.CreateUser(ctx, request)
 		if err != nil {
-			return
+			return fmt.Errorf("creating user: %w", err)
 		}
-	},
+
+		return printOutput(cmd, response.GetUser(), "User created")
+	}),
 }

 var destroyUserCmd = &cobra.Command{
-	Use:     "destroy --user USER",
+	Use:     "destroy --identifier ID or --name NAME",
 	Short:   "Destroys a user",
 	Aliases: []string{"delete"},
-	Run: func(cmd *cobra.Command, args []string) {
-		output := GetOutputFlag(cmd)
-
-		id := userIDFromFlag(cmd)
-		request := &v1.ListUsersRequest{
-			Id: id,
-		}
-
-		var user *v1.User
-		err := WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
-			users, err := client.ListUsers(ctx, request)
-			if err != nil {
-				ErrorOutput(
-					err,
-					"Error: "+status.Convert(err).Message(),
-					output,
-				)
-				return err
-			}
-
-			if len(users.GetUsers()) != 1 {
-				err := errors.New("Unable to determine user to delete, query returned multiple users, use ID")
-				ErrorOutput(
-					err,
-					"Error: "+status.Convert(err).Message(),
-					output,
-				)
-				return err
-			}
-
-			user = users.GetUsers()[0]
-			return nil
-		})
+	RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
+		id, username, err := usernameAndIDFromFlag(cmd)
 		if err != nil {
-			return
+			return err
 		}

-		confirm := false
-		force, _ := cmd.Flags().GetBool("force")
-		if !force {
-			prompt := &survey.Confirm{
-				Message: fmt.Sprintf(
-					"Do you want to remove the user %q (%d) and any associated preauthkeys?",
-					user.GetName(), user.GetId(),
-				),
-			}
-			err := survey.AskOne(prompt, &confirm)
-			if err != nil {
-				return
-			}
+		request := &v1.ListUsersRequest{
+			Name: username,
+			Id:   id,
 		}

-		if confirm || force {
-			err = WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
-				request := &v1.DeleteUserRequest{Id: user.GetId()}
-
-				response, err := client.DeleteUser(ctx, request)
-				if err != nil {
-					ErrorOutput(
-						err,
-						"Cannot destroy user: "+status.Convert(err).Message(),
-						output,
-					)
-					return err
-				}
-				SuccessOutput(response, "User destroyed", output)
-				return nil
-			})
-			if err != nil {
-				return
-			}
-		} else {
-			SuccessOutput(map[string]string{"Result": "User not destroyed"}, "User not destroyed", output)
+		users, err := client.ListUsers(ctx, request)
+		if err != nil {
+			return fmt.Errorf("listing users: %w", err)
 		}
-	},
+
+		if len(users.GetUsers()) != 1 {
+			return errMultipleUsersMatch
+		}
+
+		user := users.GetUsers()[0]
+
+		if !confirmAction(cmd, fmt.Sprintf(
+			"Do you want to remove the user %q (%d) and any associated preauthkeys?",
+			user.GetName(), user.GetId(),
+		)) {
+			return printOutput(cmd, map[string]string{"Result": "User not destroyed"}, "User not destroyed")
+		}
+
+		deleteRequest := &v1.DeleteUserRequest{Id: user.GetId()}
+
+		response, err := client.DeleteUser(ctx, deleteRequest)
+		if err != nil {
+			return fmt.Errorf("destroying user: %w", err)
+		}
+
+		return printOutput(cmd, response, "User destroyed")
+	}),
 }

 var listUsersCmd = &cobra.Command{
 	Use:     "list",
 	Short:   "List all the users",
 	Aliases: []string{"ls", "show"},
-	Run: func(cmd *cobra.Command, args []string) {
-		output := GetOutputFlag(cmd)
+	RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
+		request := &v1.ListUsersRequest{}

-		err := WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
-			request := &v1.ListUsersRequest{}
+		id, _ := cmd.Flags().GetInt64("identifier")
+		username, _ := cmd.Flags().GetString("name")
+		email, _ := cmd.Flags().GetString("email")

-			// Check for smart lookup flag first
-			userFlag, _ := cmd.Flags().GetString("user")
-			if userFlag != "" {
-				// Use smart lookup to determine filter type
-				if id, err := strconv.ParseUint(userFlag, 10, 64); err == nil && id > 0 {
-					request.Id = id
-				} else if strings.Contains(userFlag, "@") {
-					request.Email = userFlag
-				} else {
-					request.Name = userFlag
-				}
-			} else {
-				// Check specific filter flags
-				if id, _ := cmd.Flags().GetUint64("id"); id > 0 {
-					request.Id = id
-				} else if name, _ := cmd.Flags().GetString("name"); name != "" {
-					request.Name = name
-				} else if email, _ := cmd.Flags().GetString("email"); email != "" {
-					request.Email = email
-				}
-			}
+		// filter by one param at most
+		switch {
+		case id > 0:
+			request.Id = uint64(id)
+		case username != "":
+			request.Name = username
+		case email != "":
+			request.Email = email
+		}

-			response, err := client.ListUsers(ctx, request)
-			if err != nil {
-				ErrorOutput(
-					err,
-					"Cannot get users: "+status.Convert(err).Message(),
-					output,
-				)
-				return err
-			}
-
-			if output != "" {
-				SuccessOutput(response.GetUsers(), "", output)
-				return nil
-			}
+		response, err := client.ListUsers(ctx, request)
+		if err != nil {
+			return fmt.Errorf("listing users: %w", err)
+		}

+		return printListOutput(cmd, response.GetUsers(), func() error {
 			tableData := pterm.TableData{{"ID", "Name", "Username", "Email", "Created"}}
 			for _, user := range response.GetUsers() {
 				tableData = append(
 					tableData,
 					[]string{
-						strconv.FormatUint(user.GetId(), 10),
+						strconv.FormatUint(user.GetId(), util.Base10),
 						user.GetDisplayName(),
 						user.GetName(),
 						user.GetEmail(),
@@ -269,80 +196,48 @@ var listUsersCmd = &cobra.Command{
 					},
 				)
 			}
-			tableData = FilterTableColumns(cmd, tableData)
-			err = pterm.DefaultTable.WithHasHeader().WithData(tableData).Render()
-			if err != nil {
-				ErrorOutput(
-					err,
-					fmt.Sprintf("Failed to render pterm table: %s", err),
-					output,
-				)
-				return err
-			}
-			return nil
+
+			return pterm.DefaultTable.WithHasHeader().WithData(tableData).Render()
 		})
-		if err != nil {
-			// Error already handled in closure
-			return
-		}
-	},
+	}),
 }

 var renameUserCmd = &cobra.Command{
 	Use:     "rename",
 	Short:   "Renames a user",
 	Aliases: []string{"mv"},
-	Run: func(cmd *cobra.Command, args []string) {
-		output := GetOutputFlag(cmd)
+	RunE: grpcRunE(func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error {
+		id, username, err := usernameAndIDFromFlag(cmd)
+		if err != nil {
+			return err
+		}
+
+		listReq := &v1.ListUsersRequest{
+			Name: username,
+			Id:   id,
+		}
+
+		users, err := client.ListUsers(ctx, listReq)
+		if err != nil {
+			return fmt.Errorf("listing users: %w", err)
+		}
+
+		if len(users.GetUsers()) != 1 {
+			return errMultipleUsersMatch
+		}

-		id := userIDFromFlag(cmd)
 		newName, _ := cmd.Flags().GetString("new-name")

-		err := WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
-			listReq := &v1.ListUsersRequest{
-				Id: id,
-			}
-
-			users, err := client.ListUsers(ctx, listReq)
-			if err != nil {
-				ErrorOutput(
-					err,
-					"Error: "+status.Convert(err).Message(),
-					output,
-				)
-				return err
-			}
-
-			if len(users.GetUsers()) != 1 {
-				err := errors.New("Unable to determine user to delete, query returned multiple users, use ID")
-				ErrorOutput(
-					err,
-					"Error: "+status.Convert(err).Message(),
-					output,
-				)
-				return err
-			}
-
-			renameReq := &v1.RenameUserRequest{
-				OldId:   id,
-				NewName: newName,
-			}
-
-			response, err := client.RenameUser(ctx, renameReq)
-			if err != nil {
-				ErrorOutput(
-					err,
-					"Cannot rename user: "+status.Convert(err).Message(),
-					output,
-				)
-				return err
-			}
-
-			SuccessOutput(response.GetUser(), "User renamed", output)
-			return nil
-		})
-		if err != nil {
-			return
+		renameReq := &v1.RenameUserRequest{
+			OldId:   id,
+			NewName: newName,
 		}
-	},
+
+		response, err := client.RenameUser(ctx, renameReq)
+		if err != nil {
+			return fmt.Errorf("renaming user: %w", err)
+		}
+
+		return printOutput(cmd, response.GetUser(), "User renamed")
+	}),
 }
--- a/cmd/headscale/cli/utils.go
+++ b/cmd/headscale/cli/utils.go
@@ -4,24 +4,52 @@ import (
 	"context"
 	"crypto/tls"
 	"encoding/json"
+	"errors"
 	"fmt"
-	"net"
 	"os"
-	"strconv"
-	"strings"
+	"time"

 	v1 "github.com/juanfont/headscale/gen/go/headscale/v1"
 	"github.com/juanfont/headscale/hscontrol"
 	"github.com/juanfont/headscale/hscontrol/types"
 	"github.com/juanfont/headscale/hscontrol/util"
+	"github.com/juanfont/headscale/hscontrol/util/zlog/zf"
+	"github.com/prometheus/common/model"
 	"github.com/rs/zerolog/log"
 	"github.com/spf13/cobra"
 	"google.golang.org/grpc"
 	"google.golang.org/grpc/credentials"
 	"google.golang.org/grpc/credentials/insecure"
+	"google.golang.org/protobuf/types/known/timestamppb"
 	"gopkg.in/yaml.v3"
 )

+const (
+	HeadscaleDateTimeFormat = "2006-01-02 15:04:05"
+	SocketWritePermissions  = 0o666
+
+	outputFormatJSON     = "json"
+	outputFormatJSONLine = "json-line"
+	outputFormatYAML     = "yaml"
+)
+
+var (
+	errAPIKeyNotSet     = errors.New("HEADSCALE_CLI_API_KEY environment variable needs to be set")
+	errMissingParameter = errors.New("missing parameters")
+)
+
+// mustMarkRequired marks the named flags as required on cmd, panicking
+// if any name does not match a registered flag.  This is only called
+// from init() where a failure indicates a programming error.
+func mustMarkRequired(cmd *cobra.Command, names ...string) {
+	for _, n := range names {
+		err := cmd.MarkFlagRequired(n)
+		if err != nil {
+			panic(fmt.Sprintf("marking flag %q required on %q: %v", n, cmd.Name(), err))
+		}
+	}
+}
+
 func newHeadscaleServerWithConfig() (*hscontrol.Headscale, error) {
 	cfg, err := types.LoadServerConfig()
 	if err != nil {
@@ -39,14 +67,28 @@ func newHeadscaleServerWithConfig() (*hscontrol.Headscale, error) {
 	return app, nil
 }

-func newHeadscaleCLIWithConfig() (context.Context, v1.HeadscaleServiceClient, *grpc.ClientConn, context.CancelFunc) {
+// grpcRunE wraps a cobra RunE func, injecting a ready gRPC client and
+// context. Connection lifecycle is managed by the wrapper — callers
+// never see the underlying conn or cancel func.
+func grpcRunE(
+	fn func(ctx context.Context, client v1.HeadscaleServiceClient, cmd *cobra.Command, args []string) error,
+) func(*cobra.Command, []string) error {
+	return func(cmd *cobra.Command, args []string) error {
+		ctx, client, conn, cancel, err := newHeadscaleCLIWithConfig()
+		if err != nil {
+			return fmt.Errorf("connecting to headscale: %w", err)
+		}
+		defer cancel()
+		defer conn.Close()
+
+		return fn(ctx, client, cmd, args)
+	}
+}
+
+func newHeadscaleCLIWithConfig() (context.Context, v1.HeadscaleServiceClient, *grpc.ClientConn, context.CancelFunc, error) {
 	cfg, err := types.LoadCLIConfig()
 	if err != nil {
-		log.Fatal().
-			Err(err).
-			Caller().
-			Msgf("Failed to load configuration")
-		os.Exit(-1) // we get here if logging is suppressed (i.e., json output)
+		return nil, nil, nil, nil, fmt.Errorf("loading configuration: %w", err)
 	}

 	log.Debug().
@@ -56,7 +98,7 @@ func newHeadscaleCLIWithConfig() (context.Context, v1.HeadscaleServiceClient, *g
 	ctx, cancel := context.WithTimeout(context.Background(), cfg.CLI.Timeout)

 	grpcOptions := []grpc.DialOption{
-		grpc.WithBlock(),
+		grpc.WithBlock(), //nolint:staticcheck // SA1019: deprecated but supported in 1.x
 	}

 	address := cfg.CLI.Address
@@ -70,17 +112,23 @@ func newHeadscaleCLIWithConfig() (context.Context, v1.HeadscaleServiceClient, *g
 		address = cfg.UnixSocket

 		// Try to give the user better feedback if we cannot write to the headscale
-		// socket.
-		socket, err := os.OpenFile(cfg.UnixSocket, os.O_WRONLY, 0o666) // nolint
+		// socket.  Note: os.OpenFile on a Unix domain socket returns ENXIO on
+		// Linux which is expected — only permission errors are actionable here.
+		// The actual gRPC connection uses net.Dial which handles sockets properly.
+		socket, err := os.OpenFile(cfg.UnixSocket, os.O_WRONLY, SocketWritePermissions) //nolint
 		if err != nil {
 			if os.IsPermission(err) {
-				log.Fatal().
-					Err(err).
-					Str("socket", cfg.UnixSocket).
-					Msgf("Unable to read/write to headscale socket, do you have the correct permissions?")
+				cancel()
+
+				return nil, nil, nil, nil, fmt.Errorf(
+					"unable to read/write to headscale socket %q, do you have the correct permissions? %w",
+					cfg.UnixSocket,
+					err,
+				)
 			}
+		} else {
+			socket.Close()
 		}
-		socket.Close()

 		grpcOptions = append(
 			grpcOptions,
@@ -91,8 +139,11 @@ func newHeadscaleCLIWithConfig() (context.Context, v1.HeadscaleServiceClient, *g
 		// If we are not connecting to a local server, require an API key for authentication
 		apiKey := cfg.CLI.APIKey
 		if apiKey == "" {
-			log.Fatal().Caller().Msgf("HEADSCALE_CLI_API_KEY environment variable needs to be set.")
+			cancel()
+
+			return nil, nil, nil, nil, errAPIKeyNotSet
 		}
+
 		grpcOptions = append(grpcOptions,
 			grpc.WithPerRPCCredentials(tokenAuth{
 				token: apiKey,
@@ -117,64 +168,136 @@ func newHeadscaleCLIWithConfig() (context.Context, v1.HeadscaleServiceClient, *g
 		}
 	}

-	log.Trace().Caller().Str("address", address).Msg("Connecting via gRPC")
-	conn, err := grpc.DialContext(ctx, address, grpcOptions...)
+	log.Trace().Caller().Str(zf.Address, address).Msg("connecting via gRPC")
+
+	conn, err := grpc.DialContext(ctx, address, grpcOptions...) //nolint:staticcheck // SA1019: deprecated but supported in 1.x
 	if err != nil {
-		log.Fatal().Caller().Err(err).Msgf("Could not connect: %v", err)
-		os.Exit(-1) // we get here if logging is suppressed (i.e., json output)
+		cancel()
+
+		return nil, nil, nil, nil, fmt.Errorf("connecting to %s: %w", address, err)
 	}

 	client := v1.NewHeadscaleServiceClient(conn)

-	return ctx, client, conn, cancel
+	return ctx, client, conn, cancel, nil
 }

-func output(result interface{}, override string, outputFormat string) string {
-	var jsonBytes []byte
-	var err error
+// formatOutput serialises result into the requested format. For the
+// default (empty) format the human-readable override string is returned.
+func formatOutput(result any, override string, outputFormat string) (string, error) {
 	switch outputFormat {
-	case "json":
-		jsonBytes, err = json.MarshalIndent(result, "", "\t")
+	case outputFormatJSON:
+		b, err := json.MarshalIndent(result, "", "\t")
 		if err != nil {
-			log.Fatal().Err(err).Msg("failed to unmarshal output")
+			return "", fmt.Errorf("marshalling JSON output: %w", err)
 		}
-	case "json-line":
-		jsonBytes, err = json.Marshal(result)
+
+		return string(b), nil
+	case outputFormatJSONLine:
+		b, err := json.Marshal(result)
 		if err != nil {
-			log.Fatal().Err(err).Msg("failed to unmarshal output")
+			return "", fmt.Errorf("marshalling JSON-line output: %w", err)
 		}
-	case "yaml":
-		jsonBytes, err = yaml.Marshal(result)
+
+		return string(b), nil
+	case outputFormatYAML:
+		b, err := yaml.Marshal(result)
 		if err != nil {
-			log.Fatal().Err(err).Msg("failed to unmarshal output")
+			return "", fmt.Errorf("marshalling YAML output: %w", err)
 		}
+
+		return string(b), nil
 	default:
-		// nolint
-		return override
+		return override, nil
+	}
+}
+
+// printOutput formats result and writes it to stdout. It reads the --output
+// flag from cmd to decide the serialisation format.
+func printOutput(cmd *cobra.Command, result any, override string) error {
+	format, _ := cmd.Flags().GetString("output")
+
+	out, err := formatOutput(result, override, format)
+	if err != nil {
+		return err
 	}

-	return string(jsonBytes)
+	fmt.Println(out)
+
+	return nil
 }

-// SuccessOutput prints the result to stdout and exits with status code 0.
-func SuccessOutput(result interface{}, override string, outputFormat string) {
-	fmt.Println(output(result, override, outputFormat))
-	os.Exit(0)
+// expirationFromFlag parses the --expiration flag as a Prometheus-style
+// duration (e.g. "90d", "1h") and returns an absolute timestamp.
+func expirationFromFlag(cmd *cobra.Command) (*timestamppb.Timestamp, error) {
+	durationStr, _ := cmd.Flags().GetString("expiration")
+
+	duration, err := model.ParseDuration(durationStr)
+	if err != nil {
+		return nil, fmt.Errorf("parsing duration: %w", err)
+	}
+
+	return timestamppb.New(time.Now().UTC().Add(time.Duration(duration))), nil
 }

-// ErrorOutput prints an error message to stderr and exits with status code 1.
-func ErrorOutput(errResult error, override string, outputFormat string) {
+// confirmAction returns true when the user confirms a prompt, or when
+// --force is set.  Callers decide what to do when it returns false.
+func confirmAction(cmd *cobra.Command, prompt string) bool {
+	force, _ := cmd.Flags().GetBool("force")
+	if force {
+		return true
+	}
+
+	return util.YesNo(prompt)
+}
+
+// printListOutput checks the --output flag: when a machine-readable format is
+// requested it serialises data as JSON/YAML; otherwise it calls renderTable
+// to produce the human-readable pterm table.
+func printListOutput(
+	cmd *cobra.Command,
+	data any,
+	renderTable func() error,
+) error {
+	format, _ := cmd.Flags().GetString("output")
+	if format != "" {
+		return printOutput(cmd, data, "")
+	}
+
+	return renderTable()
+}
+
+// printError writes err to stderr, formatting it as JSON/YAML when the
+// --output flag requests machine-readable output.  Used exclusively by
+// Execute() so that every error surfaces in the format the caller asked for.
+func printError(err error, outputFormat string) {
 	type errOutput struct {
 		Error string `json:"error"`
 	}

-	fmt.Fprintf(os.Stderr, "%s\n", output(errOutput{errResult.Error()}, override, outputFormat))
-	os.Exit(1)
+	e := errOutput{Error: err.Error()}
+
+	var formatted []byte
+
+	switch outputFormat {
+	case outputFormatJSON:
+		formatted, _ = json.MarshalIndent(e, "", "\t") //nolint:errchkjson // errOutput contains only a string field
+	case outputFormatJSONLine:
+		formatted, _ = json.Marshal(e) //nolint:errchkjson // errOutput contains only a string field
+	case outputFormatYAML:
+		formatted, _ = yaml.Marshal(e)
+	default:
+		fmt.Fprintf(os.Stderr, "Error: %s\n", err)
+
+		return
+	}
+
+	fmt.Fprintf(os.Stderr, "%s\n", formatted)
 }

-func HasMachineOutputFlag() bool {
+func hasMachineOutputFlag() bool {
 	for _, arg := range os.Args {
-		if arg == "json" || arg == "json-line" || arg == "yaml" {
+		if arg == outputFormatJSON || arg == outputFormatJSONLine || arg == outputFormatYAML {
 			return true
 		}
 	}
@@ -199,152 +322,3 @@ func (t tokenAuth) GetRequestMetadata(
 func (tokenAuth) RequireTransportSecurity() bool {
 	return true
 }
-
-// GetOutputFlag returns the output flag value (never fails)
-func GetOutputFlag(cmd *cobra.Command) string {
-	output, _ := cmd.Flags().GetString("output")
-	return output
-}
-
-
-// GetNodeIdentifier returns the node ID using smart lookup via gRPC ListNodes call
-func GetNodeIdentifier(cmd *cobra.Command) (uint64, error) {
-	nodeFlag, _ := cmd.Flags().GetString("node")
-
-	// Use --node flag
-	if nodeFlag == "" {
-		return 0, fmt.Errorf("--node flag is required")
-	}
-
-	// Use smart lookup via gRPC
-	return lookupNodeBySpecifier(nodeFlag)
-}
-
-// lookupNodeBySpecifier performs smart lookup of a node by ID, name, hostname, or IP
-func lookupNodeBySpecifier(specifier string) (uint64, error) {
-	var nodeID uint64
-
-	err := WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
-		request := &v1.ListNodesRequest{}
-
-		// Detect what type of specifier this is and set appropriate filter
-		if id, err := strconv.ParseUint(specifier, 10, 64); err == nil && id > 0 {
-			// Looks like a numeric ID
-			request.Id = id
-		} else if isIPAddress(specifier) {
-			// Looks like an IP address
-			request.IpAddresses = []string{specifier}
-		} else {
-			// Treat as hostname/name
-			request.Name = specifier
-		}
-
-		response, err := client.ListNodes(ctx, request)
-		if err != nil {
-			return fmt.Errorf("failed to lookup node: %w", err)
-		}
-
-		nodes := response.GetNodes()
-		if len(nodes) == 0 {
-			return fmt.Errorf("node not found")
-		}
-
-		if len(nodes) > 1 {
-			var nodeInfo []string
-			for _, node := range nodes {
-				nodeInfo = append(nodeInfo, fmt.Sprintf("ID=%d name=%s", node.GetId(), node.GetName()))
-			}
-			return fmt.Errorf("multiple nodes found matching '%s': %s", specifier, strings.Join(nodeInfo, ", "))
-		}
-
-		// Exactly one match - this is what we want
-		nodeID = nodes[0].GetId()
-		return nil
-	})
-	if err != nil {
-		return 0, err
-	}
-
-	return nodeID, nil
-}
-
-// isIPAddress checks if a string looks like an IP address
-func isIPAddress(s string) bool {
-	// Try parsing as IP address (both IPv4 and IPv6)
-	if net.ParseIP(s) != nil {
-		return true
-	}
-	// Try parsing as CIDR
-	if _, _, err := net.ParseCIDR(s); err == nil {
-		return true
-	}
-	return false
-}
-
-// GetUserIdentifier returns the user ID using smart lookup via gRPC ListUsers call
-func GetUserIdentifier(cmd *cobra.Command) (uint64, error) {
-	userFlag, _ := cmd.Flags().GetString("user")
-	nameFlag, _ := cmd.Flags().GetString("name")
-
-	var specifier string
-
-	// Determine which flag was used (prefer --user, fall back to legacy flags)
-	if userFlag != "" {
-		specifier = userFlag
-	} else if nameFlag != "" {
-		specifier = nameFlag
-	} else {
-		return 0, fmt.Errorf("--user flag is required")
-	}
-
-	// Use smart lookup via gRPC
-	return lookupUserBySpecifier(specifier)
-}
-
-// lookupUserBySpecifier performs smart lookup of a user by ID, name, or email
-func lookupUserBySpecifier(specifier string) (uint64, error) {
-	var userID uint64
-
-	err := WithClient(func(ctx context.Context, client v1.HeadscaleServiceClient) error {
-		request := &v1.ListUsersRequest{}
-
-		// Detect what type of specifier this is and set appropriate filter
-		if id, err := strconv.ParseUint(specifier, 10, 64); err == nil && id > 0 {
-			// Looks like a numeric ID
-			request.Id = id
-		} else if strings.Contains(specifier, "@") {
-			// Looks like an email address
-			request.Email = specifier
-		} else {
-			// Treat as username
-			request.Name = specifier
-		}
-
-		response, err := client.ListUsers(ctx, request)
-		if err != nil {
-			return fmt.Errorf("failed to lookup user: %w", err)
-		}
-
-		users := response.GetUsers()
-		if len(users) == 0 {
-			return fmt.Errorf("user not found")
-		}
-
-		if len(users) > 1 {
-			var userInfo []string
-			for _, user := range users {
-				userInfo = append(userInfo, fmt.Sprintf("ID=%d name=%s email=%s", user.GetId(), user.GetName(), user.GetEmail()))
-			}
-			return fmt.Errorf("multiple users found matching '%s': %s", specifier, strings.Join(userInfo, ", "))
-		}
-
-		// Exactly one match - this is what we want
-		userID = users[0].GetId()
-		return nil
-	})
-	if err != nil {
-		return 0, err
-	}
-
-	return userID, nil
-}
--- a/cmd/headscale/cli/utils_test.go
+++ b/cmd/headscale/cli/utils_test.go
@@ -1,175 +0,0 @@
-package cli
-
-import (
-	"os"
-	"testing"
-
-	"github.com/stretchr/testify/assert"
-)
-
-func TestHasMachineOutputFlag(t *testing.T) {
-	tests := []struct {
-		name     string
-		args     []string
-		expected bool
-	}{
-		{
-			name:     "no machine output flags",
-			args:     []string{"headscale", "users", "list"},
-			expected: false,
-		},
-		{
-			name:     "json flag present",
-			args:     []string{"headscale", "users", "list", "json"},
-			expected: true,
-		},
-		{
-			name:     "json-line flag present",
-			args:     []string{"headscale", "nodes", "list", "json-line"},
-			expected: true,
-		},
-		{
-			name:     "yaml flag present",
-			args:     []string{"headscale", "apikeys", "list", "yaml"},
-			expected: true,
-		},
-		{
-			name:     "mixed flags with json",
-			args:     []string{"headscale", "--config", "/tmp/config.yaml", "users", "list", "json"},
-			expected: true,
-		},
-		{
-			name:     "flag as part of longer argument",
-			args:     []string{"headscale", "users", "create", "json-user@example.com"},
-			expected: false,
-		},
-	}
-
-	for _, tt := range tests {
-		t.Run(tt.name, func(t *testing.T) {
-			// Save original os.Args
-			originalArgs := os.Args
-			defer func() { os.Args = originalArgs }()
-
-			// Set os.Args to test case
-			os.Args = tt.args
-
-			result := HasMachineOutputFlag()
-			assert.Equal(t, tt.expected, result)
-		})
-	}
-}
-
-func TestOutput(t *testing.T) {
-	tests := []struct {
-		name         string
-		result       interface{}
-		override     string
-		outputFormat string
-		expected     string
-	}{
-		{
-			name:         "default format returns override",
-			result:       map[string]string{"test": "value"},
-			override:     "Human readable output",
-			outputFormat: "",
-			expected:     "Human readable output",
-		},
-		{
-			name:         "default format with empty override",
-			result:       map[string]string{"test": "value"},
-			override:     "",
-			outputFormat: "",
-			expected:     "",
-		},
-		{
-			name:         "json format",
-			result:       map[string]string{"name": "test", "id": "123"},
-			override:     "Human readable",
-			outputFormat: "json",
-			expected:     "{\n\t\"id\": \"123\",\n\t\"name\": \"test\"\n}",
-		},
-		{
-			name:         "json-line format",
-			result:       map[string]string{"name": "test", "id": "123"},
-			override:     "Human readable",
-			outputFormat: "json-line",
-			expected:     "{\"id\":\"123\",\"name\":\"test\"}",
-		},
-		{
-			name:         "yaml format",
-			result:       map[string]string{"name": "test", "id": "123"},
-			override:     "Human readable",
-			outputFormat: "yaml",
-			expected:     "id: \"123\"\nname: test\n",
-		},
-		{
-			name:         "invalid format returns override",
-			result:       map[string]string{"test": "value"},
-			override:     "Human readable output",
-			outputFormat: "invalid",
-			expected:     "Human readable output",
-		},
-	}
-
-	for _, tt := range tests {
-		t.Run(tt.name, func(t *testing.T) {
-			result := output(tt.result, tt.override, tt.outputFormat)
-			assert.Equal(t, tt.expected, result)
-		})
-	}
-}
-
-func TestOutputWithComplexData(t *testing.T) {
-	// Test with more complex data structures
-	complexData := struct {
-		Users []struct {
-			Name string `json:"name" yaml:"name"`
-			ID   int    `json:"id" yaml:"id"`
-		} `json:"users" yaml:"users"`
-	}{
-		Users: []struct {
-			Name string `json:"name" yaml:"name"`
-			ID   int    `json:"id" yaml:"id"`
-		}{
-			{Name: "user1", ID: 1},
-			{Name: "user2", ID: 2},
-		},
-	}
-
-	// Test JSON output
-	jsonResult := output(complexData, "override", "json")
-	assert.Contains(t, jsonResult, "\"users\":")
-	assert.Contains(t, jsonResult, "\"name\": \"user1\"")
-	assert.Contains(t, jsonResult, "\"id\": 1")
-
-	// Test YAML output
-	yamlResult := output(complexData, "override", "yaml")
-	assert.Contains(t, yamlResult, "users:")
-	assert.Contains(t, yamlResult, "name: user1")
-	assert.Contains(t, yamlResult, "id: 1")
-}
-
-func TestOutputWithNilData(t *testing.T) {
-	// Test with nil data
-	result := output(nil, "fallback", "json")
-	assert.Equal(t, "null", result)
-
-	result = output(nil, "fallback", "yaml")
-	assert.Equal(t, "null\n", result)
-
-	result = output(nil, "fallback", "")
-	assert.Equal(t, "fallback", result)
-}
-
-func TestOutputWithEmptyData(t *testing.T) {
-	// Test with empty slice
-	emptySlice := []string{}
-	result := output(emptySlice, "fallback", "json")
-	assert.Equal(t, "[]", result)
-
-	// Test with empty map
-	emptyMap := map[string]string{}
-	result = output(emptyMap, "fallback", "json")
-	assert.Equal(t, "{}", result)
-}
--- a/cmd/headscale/cli/version.go
+++ b/cmd/headscale/cli/version.go
@@ -7,17 +7,16 @@ import (

 func init() {
 	rootCmd.AddCommand(versionCmd)
+	versionCmd.Flags().StringP("output", "o", "", "Output format. Empty for human-readable, 'json', 'json-line' or 'yaml'")
 }

 var versionCmd = &cobra.Command{
 	Use:   "version",
-	Short: "Print the version",
-	Long:  "The version of headscale",
-	Run: func(cmd *cobra.Command, args []string) {
-		output := GetOutputFlag(cmd)
-		SuccessOutput(map[string]string{
-			"version": types.Version,
-			"commit":  types.GitCommitHash,
-		}, types.Version, output)
+	Short: "Print the version.",
+	Long:  "The version of headscale.",
+	RunE: func(cmd *cobra.Command, args []string) error {
+		info := types.GetVersionInfo()
+
+		return printOutput(cmd, info, info.String())
 	},
 }
--- a/cmd/headscale/cli/version_test.go
+++ b/cmd/headscale/cli/version_test.go
@@ -1,45 +0,0 @@
-package cli
-
-import (
-	"testing"
-
-	"github.com/stretchr/testify/assert"
-)
-
-func TestVersionCommand(t *testing.T) {
-	// Test that version command exists
-	assert.NotNil(t, versionCmd)
-	assert.Equal(t, "version", versionCmd.Use)
-	assert.Equal(t, "Print the version.", versionCmd.Short)
-	assert.Equal(t, "The version of headscale.", versionCmd.Long)
-}
-
-func TestVersionCommandStructure(t *testing.T) {
-	// Test command is properly added to root
-	found := false
-	for _, cmd := range rootCmd.Commands() {
-		if cmd.Use == "version" {
-			found = true
-			break
-		}
-	}
-	assert.True(t, found, "version command should be added to root command")
-}
-
-func TestVersionCommandFlags(t *testing.T) {
-	// Version command should inherit output flag from root as persistent flag
-	outputFlag := versionCmd.Flag("output")
-	if outputFlag == nil {
-		// Try persistent flags from root
-		outputFlag = rootCmd.PersistentFlags().Lookup("output")
-	}
-	assert.NotNil(t, outputFlag, "version command should have access to output flag")
-}
-
-func TestVersionCommandRun(t *testing.T) {
-	// Test that Run function is set
-	assert.NotNil(t, versionCmd.Run)
-
-	// We can't easily test the actual execution without mocking SuccessOutput
-	// but we can verify the function exists and has the right signature
-}
--- a/cmd/headscale/headscale.go
+++ b/cmd/headscale/headscale.go
@@ -12,6 +12,7 @@ import (

 func main() {
 	var colors bool
+
 	switch l := termcolor.SupportLevel(os.Stderr); l {
 	case termcolor.Level16M:
 		colors = true
--- a/cmd/headscale/headscale_test.go
+++ b/cmd/headscale/headscale_test.go
@@ -9,34 +9,15 @@ import (
 	"github.com/juanfont/headscale/hscontrol/types"
 	"github.com/juanfont/headscale/hscontrol/util"
 	"github.com/spf13/viper"
-	"gopkg.in/check.v1"
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
 )

-func Test(t *testing.T) {
-	check.TestingT(t)
-}
-
-var _ = check.Suite(&Suite{})
-
-type Suite struct{}
-
-func (s *Suite) SetUpSuite(c *check.C) {
-}
-
-func (s *Suite) TearDownSuite(c *check.C) {
-}
-
-func (*Suite) TestConfigFileLoading(c *check.C) {
-	tmpDir, err := os.MkdirTemp("", "headscale")
-	if err != nil {
-		c.Fatal(err)
-	}
-	defer os.RemoveAll(tmpDir)
+func TestConfigFileLoading(t *testing.T) {
+	tmpDir := t.TempDir()

 	path, err := os.Getwd()
-	if err != nil {
-		c.Fatal(err)
-	}
+	require.NoError(t, err)

 	cfgFile := filepath.Join(tmpDir, "config.yaml")

@@ -45,70 +26,52 @@ func (*Suite) TestConfigFileLoading(c *check.C) {
 		filepath.Clean(path+"/../../config-example.yaml"),
 		cfgFile,
 	)
-	if err != nil {
-		c.Fatal(err)
-	}
+	require.NoError(t, err)

 	// Load example config, it should load without validation errors
 	err = types.LoadConfig(cfgFile, true)
-	c.Assert(err, check.IsNil)
+	require.NoError(t, err)

 	// Test that config file was interpreted correctly
-	c.Assert(viper.GetString("server_url"), check.Equals, "http://127.0.0.1:8080")
-	c.Assert(viper.GetString("listen_addr"), check.Equals, "127.0.0.1:8080")
-	c.Assert(viper.GetString("metrics_listen_addr"), check.Equals, "127.0.0.1:9090")
-	c.Assert(viper.GetString("database.type"), check.Equals, "sqlite")
-	c.Assert(viper.GetString("database.sqlite.path"), check.Equals, "/var/lib/headscale/db.sqlite")
-	c.Assert(viper.GetString("tls_letsencrypt_hostname"), check.Equals, "")
-	c.Assert(viper.GetString("tls_letsencrypt_listen"), check.Equals, ":http")
-	c.Assert(viper.GetString("tls_letsencrypt_challenge_type"), check.Equals, "HTTP-01")
-	c.Assert(
-		util.GetFileMode("unix_socket_permission"),
-		check.Equals,
-		fs.FileMode(0o770),
-	)
-	c.Assert(viper.GetBool("logtail.enabled"), check.Equals, false)
+	assert.Equal(t, "http://127.0.0.1:8080", viper.GetString("server_url"))
+	assert.Equal(t, "127.0.0.1:8080", viper.GetString("listen_addr"))
+	assert.Equal(t, "127.0.0.1:9090", viper.GetString("metrics_listen_addr"))
+	assert.Equal(t, "sqlite", viper.GetString("database.type"))
+	assert.Equal(t, "/var/lib/headscale/db.sqlite", viper.GetString("database.sqlite.path"))
+	assert.Empty(t, viper.GetString("tls_letsencrypt_hostname"))
+	assert.Equal(t, ":http", viper.GetString("tls_letsencrypt_listen"))
+	assert.Equal(t, "HTTP-01", viper.GetString("tls_letsencrypt_challenge_type"))
+	assert.Equal(t, fs.FileMode(0o770), util.GetFileMode("unix_socket_permission"))
+	assert.False(t, viper.GetBool("logtail.enabled"))
 }

-func (*Suite) TestConfigLoading(c *check.C) {
-	tmpDir, err := os.MkdirTemp("", "headscale")
-	if err != nil {
-		c.Fatal(err)
-	}
-	defer os.RemoveAll(tmpDir)
+func TestConfigLoading(t *testing.T) {
+	tmpDir := t.TempDir()

 	path, err := os.Getwd()
-	if err != nil {
-		c.Fatal(err)
-	}
+	require.NoError(t, err)

 	// Symlink the example config file
 	err = os.Symlink(
 		filepath.Clean(path+"/../../config-example.yaml"),
 		filepath.Join(tmpDir, "config.yaml"),
 	)
-	if err != nil {
-		c.Fatal(err)
-	}
+	require.NoError(t, err)

 	// Load example config, it should load without validation errors
 	err = types.LoadConfig(tmpDir, false)
-	c.Assert(err, check.IsNil)
+	require.NoError(t, err)

 	// Test that config file was interpreted correctly
-	c.Assert(viper.GetString("server_url"), check.Equals, "http://127.0.0.1:8080")
-	c.Assert(viper.GetString("listen_addr"), check.Equals, "127.0.0.1:8080")
-	c.Assert(viper.GetString("metrics_listen_addr"), check.Equals, "127.0.0.1:9090")
-	c.Assert(viper.GetString("database.type"), check.Equals, "sqlite")
-	c.Assert(viper.GetString("database.sqlite.path"), check.Equals, "/var/lib/headscale/db.sqlite")
-	c.Assert(viper.GetString("tls_letsencrypt_hostname"), check.Equals, "")
-	c.Assert(viper.GetString("tls_letsencrypt_listen"), check.Equals, ":http")
-	c.Assert(viper.GetString("tls_letsencrypt_challenge_type"), check.Equals, "HTTP-01")
-	c.Assert(
-		util.GetFileMode("unix_socket_permission"),
-		check.Equals,
-		fs.FileMode(0o770),
-	)
-	c.Assert(viper.GetBool("logtail.enabled"), check.Equals, false)
-	c.Assert(viper.GetBool("randomize_client_port"), check.Equals, false)
+	assert.Equal(t, "http://127.0.0.1:8080", viper.GetString("server_url"))
+	assert.Equal(t, "127.0.0.1:8080", viper.GetString("listen_addr"))
+	assert.Equal(t, "127.0.0.1:9090", viper.GetString("metrics_listen_addr"))
+	assert.Equal(t, "sqlite", viper.GetString("database.type"))
+	assert.Equal(t, "/var/lib/headscale/db.sqlite", viper.GetString("database.sqlite.path"))
+	assert.Empty(t, viper.GetString("tls_letsencrypt_hostname"))
+	assert.Equal(t, ":http", viper.GetString("tls_letsencrypt_listen"))
+	assert.Equal(t, "HTTP-01", viper.GetString("tls_letsencrypt_challenge_type"))
+	assert.Equal(t, fs.FileMode(0o770), util.GetFileMode("unix_socket_permission"))
+	assert.False(t, viper.GetBool("logtail.enabled"))
+	assert.False(t, viper.GetBool("randomize_client_port"))
 }
--- a/cmd/hi/README.md
+++ b/cmd/hi/README.md
@@ -0,0 +1,6 @@
+# hi
+
+hi (headscale integration runner) is an entirely "vibe coded" wrapper around our
+[integration test suite](../integration). It essentially runs the docker
+commands for you with some added benefits of extracting resources like logs and
+databases.
--- a/cmd/hi/cleanup.go
+++ b/cmd/hi/cleanup.go
@@ -3,9 +3,13 @@ package main
 import (
 	"context"
 	"fmt"
+	"log"
+	"os"
+	"path/filepath"
 	"strings"
 	"time"

+	"github.com/cenkalti/backoff/v5"
 	"github.com/docker/docker/api/types/container"
 	"github.com/docker/docker/api/types/filters"
 	"github.com/docker/docker/api/types/image"
@@ -14,30 +18,46 @@ import (
 )

 // cleanupBeforeTest performs cleanup operations before running tests.
+// Only removes stale (stopped/exited) test containers to avoid interfering with concurrent test runs.
 func cleanupBeforeTest(ctx context.Context) error {
-	if err := killTestContainers(ctx); err != nil {
-		return fmt.Errorf("failed to kill test containers: %w", err)
+	err := cleanupStaleTestContainers(ctx)
+	if err != nil {
+		return fmt.Errorf("cleaning stale test containers: %w", err)
 	}

-	if err := pruneDockerNetworks(ctx); err != nil {
-		return fmt.Errorf("failed to prune networks: %w", err)
+	if err := pruneDockerNetworks(ctx); err != nil { //nolint:noinlineerr
+		return fmt.Errorf("pruning networks: %w", err)
 	}

 	return nil
 }

-// cleanupAfterTest removes the test container after completion.
-func cleanupAfterTest(ctx context.Context, cli *client.Client, containerID string) error {
-	return cli.ContainerRemove(ctx, containerID, container.RemoveOptions{
+// cleanupAfterTest removes the test container and all associated integration test containers for the run.
+func cleanupAfterTest(ctx context.Context, cli *client.Client, containerID, runID string) error {
+	// Remove the main test container
+	err := cli.ContainerRemove(ctx, containerID, container.RemoveOptions{
 		Force: true,
 	})
+	if err != nil {
+		return fmt.Errorf("removing test container: %w", err)
+	}
+
+	// Clean up integration test containers for this run only
+	if runID != "" {
+		err := killTestContainersByRunID(ctx, runID)
+		if err != nil {
+			return fmt.Errorf("cleaning up containers for run %s: %w", runID, err)
+		}
+	}
+
+	return nil
 }

 // killTestContainers terminates and removes all test containers.
 func killTestContainers(ctx context.Context) error {
-	cli, err := createDockerClient()
+	cli, err := createDockerClient(ctx)
 	if err != nil {
-		return fmt.Errorf("failed to create Docker client: %w", err)
+		return fmt.Errorf("creating Docker client: %w", err)
 	}
 	defer cli.Close()

@@ -45,12 +65,14 @@ func killTestContainers(ctx context.Context) error {
 		All: true,
 	})
 	if err != nil {
-		return fmt.Errorf("failed to list containers: %w", err)
+		return fmt.Errorf("listing containers: %w", err)
 	}

 	removed := 0
+
 	for _, cont := range containers {
 		shouldRemove := false
+
 		for _, name := range cont.Names {
 			if strings.Contains(name, "headscale-test-suite") ||
 				strings.Contains(name, "hs-") ||
@@ -83,43 +105,135 @@ func killTestContainers(ctx context.Context) error {
 	return nil
 }

+// killTestContainersByRunID terminates and removes all test containers for a specific run ID.
+// This function filters containers by the hi.run-id label to only affect containers
+// belonging to the specified test run, leaving other concurrent test runs untouched.
+func killTestContainersByRunID(ctx context.Context, runID string) error {
+	cli, err := createDockerClient(ctx)
+	if err != nil {
+		return fmt.Errorf("creating Docker client: %w", err)
+	}
+	defer cli.Close()
+
+	// Filter containers by hi.run-id label
+	containers, err := cli.ContainerList(ctx, container.ListOptions{
+		All: true,
+		Filters: filters.NewArgs(
+			filters.Arg("label", "hi.run-id="+runID),
+		),
+	})
+	if err != nil {
+		return fmt.Errorf("listing containers for run %s: %w", runID, err)
+	}
+
+	removed := 0
+
+	for _, cont := range containers {
+		// Kill the container if it's running
+		if cont.State == "running" {
+			_ = cli.ContainerKill(ctx, cont.ID, "KILL")
+		}
+
+		// Remove the container with retry logic
+		if removeContainerWithRetry(ctx, cli, cont.ID) {
+			removed++
+		}
+	}
+
+	if removed > 0 {
+		fmt.Printf("Removed %d containers for run ID %s\n", removed, runID)
+	}
+
+	return nil
+}
+
+// cleanupStaleTestContainers removes stopped/exited test containers without affecting running tests.
+// This is useful for cleaning up leftover containers from previous crashed or interrupted test runs
+// without interfering with currently running concurrent tests.
+func cleanupStaleTestContainers(ctx context.Context) error {
+	cli, err := createDockerClient(ctx)
+	if err != nil {
+		return fmt.Errorf("creating Docker client: %w", err)
+	}
+	defer cli.Close()
+
+	// Only get stopped/exited containers
+	containers, err := cli.ContainerList(ctx, container.ListOptions{
+		All: true,
+		Filters: filters.NewArgs(
+			filters.Arg("status", "exited"),
+			filters.Arg("status", "dead"),
+		),
+	})
+	if err != nil {
+		return fmt.Errorf("listing stopped containers: %w", err)
+	}
+
+	removed := 0
+
+	for _, cont := range containers {
+		// Only remove containers that look like test containers
+		shouldRemove := false
+
+		for _, name := range cont.Names {
+			if strings.Contains(name, "headscale-test-suite") ||
+				strings.Contains(name, "hs-") ||
+				strings.Contains(name, "ts-") ||
+				strings.Contains(name, "derp-") {
+				shouldRemove = true
+				break
+			}
+		}
+
+		if shouldRemove {
+			if removeContainerWithRetry(ctx, cli, cont.ID) {
+				removed++
+			}
+		}
+	}
+
+	if removed > 0 {
+		fmt.Printf("Removed %d stale test containers\n", removed)
+	}
+
+	return nil
+}
+
+const (
+	containerRemoveInitialInterval = 100 * time.Millisecond
+	containerRemoveMaxElapsedTime  = 2 * time.Second
+)
+
 // removeContainerWithRetry attempts to remove a container with exponential backoff retry logic.
 func removeContainerWithRetry(ctx context.Context, cli *client.Client, containerID string) bool {
-	maxRetries := 3
-	baseDelay := 100 * time.Millisecond
+	expBackoff := backoff.NewExponentialBackOff()
+	expBackoff.InitialInterval = containerRemoveInitialInterval

-	for attempt := range maxRetries {
+	_, err := backoff.Retry(ctx, func() (struct{}, error) {
 		err := cli.ContainerRemove(ctx, containerID, container.RemoveOptions{
 			Force: true,
 		})
-		if err == nil {
-			return true
+		if err != nil {
+			return struct{}{}, err
 		}

-		// If this is the last attempt, don't wait
-		if attempt == maxRetries-1 {
-			break
-		}
+		return struct{}{}, nil
+	}, backoff.WithBackOff(expBackoff), backoff.WithMaxElapsedTime(containerRemoveMaxElapsedTime))

-		// Wait with exponential backoff
-		delay := baseDelay * time.Duration(1<<attempt)
-		time.Sleep(delay)
-	}
-
-	return false
+	return err == nil
 }

 // pruneDockerNetworks removes unused Docker networks.
 func pruneDockerNetworks(ctx context.Context) error {
-	cli, err := createDockerClient()
+	cli, err := createDockerClient(ctx)
 	if err != nil {
-		return fmt.Errorf("failed to create Docker client: %w", err)
+		return fmt.Errorf("creating Docker client: %w", err)
 	}
 	defer cli.Close()

 	report, err := cli.NetworksPrune(ctx, filters.Args{})
 	if err != nil {
-		return fmt.Errorf("failed to prune networks: %w", err)
+		return fmt.Errorf("pruning networks: %w", err)
 	}

 	if len(report.NetworksDeleted) > 0 {
@@ -133,9 +247,9 @@ func pruneDockerNetworks(ctx context.Context) error {

 // cleanOldImages removes test-related and old dangling Docker images.
 func cleanOldImages(ctx context.Context) error {
-	cli, err := createDockerClient()
+	cli, err := createDockerClient(ctx)
 	if err != nil {
-		return fmt.Errorf("failed to create Docker client: %w", err)
+		return fmt.Errorf("creating Docker client: %w", err)
 	}
 	defer cli.Close()

@@ -143,12 +257,14 @@ func cleanOldImages(ctx context.Context) error {
 		All: true,
 	})
 	if err != nil {
-		return fmt.Errorf("failed to list images: %w", err)
+		return fmt.Errorf("listing images: %w", err)
 	}

 	removed := 0
+
 	for _, img := range images {
 		shouldRemove := false
+
 		for _, tag := range img.RepoTags {
 			if strings.Contains(tag, "hs-") ||
 				strings.Contains(tag, "headscale-integration") ||
@@ -183,18 +299,19 @@ func cleanOldImages(ctx context.Context) error {

 // cleanCacheVolume removes the Docker volume used for Go module cache.
 func cleanCacheVolume(ctx context.Context) error {
-	cli, err := createDockerClient()
+	cli, err := createDockerClient(ctx)
 	if err != nil {
-		return fmt.Errorf("failed to create Docker client: %w", err)
+		return fmt.Errorf("creating Docker client: %w", err)
 	}
 	defer cli.Close()

 	volumeName := "hs-integration-go-cache"
+
 	err = cli.VolumeRemove(ctx, volumeName, true)
 	if err != nil {
-		if errdefs.IsNotFound(err) {
+		if errdefs.IsNotFound(err) { //nolint:staticcheck // SA1019: deprecated but functional
 			fmt.Printf("Go module cache volume not found: %s\n", volumeName)
-		} else if errdefs.IsConflict(err) {
+		} else if errdefs.IsConflict(err) { //nolint:staticcheck // SA1019: deprecated but functional
 			fmt.Printf("Go module cache volume is in use and cannot be removed: %s\n", volumeName)
 		} else {
 			fmt.Printf("Failed to remove Go module cache volume %s: %v\n", volumeName, err)
@@ -205,3 +322,110 @@ func cleanCacheVolume(ctx context.Context) error {

 	return nil
 }
+
+// cleanupSuccessfulTestArtifacts removes artifacts from successful test runs to save disk space.
+// This function removes large artifacts that are mainly useful for debugging failures:
+// - Database dumps (.db files)
+// - Profile data (pprof directories)
+// - MapResponse data (mapresponses directories)
+// - Prometheus metrics files
+//
+// It preserves:
+// - Log files (.log) which are small and useful for verification.
+func cleanupSuccessfulTestArtifacts(logsDir string, verbose bool) error {
+	entries, err := os.ReadDir(logsDir)
+	if err != nil {
+		return fmt.Errorf("reading logs directory: %w", err)
+	}
+
+	var (
+		removedFiles, removedDirs int
+		totalSize                 int64
+	)
+
+	for _, entry := range entries {
+		name := entry.Name()
+		fullPath := filepath.Join(logsDir, name)
+
+		if entry.IsDir() {
+			// Remove pprof and mapresponses directories (typically large)
+			// These directories contain artifacts from all containers in the test run
+			if name == "pprof" || name == "mapresponses" {
+				size, sizeErr := getDirSize(fullPath)
+				if sizeErr == nil {
+					totalSize += size
+				}
+
+				err := os.RemoveAll(fullPath)
+				if err != nil {
+					if verbose {
+						log.Printf("Warning: failed to remove directory %s: %v", name, err)
+					}
+				} else {
+					removedDirs++
+
+					if verbose {
+						log.Printf("Removed directory: %s/", name)
+					}
+				}
+			}
+		} else {
+			// Only process test-related files (headscale and tailscale)
+			if !strings.HasPrefix(name, "hs-") && !strings.HasPrefix(name, "ts-") {
+				continue
+			}
+
+			// Remove database, metrics, and status files, but keep logs
+			shouldRemove := strings.HasSuffix(name, ".db") ||
+				strings.HasSuffix(name, "_metrics.txt") ||
+				strings.HasSuffix(name, "_status.json")
+
+			if shouldRemove {
+				info, infoErr := entry.Info()
+				if infoErr == nil {
+					totalSize += info.Size()
+				}
+
+				err := os.Remove(fullPath)
+				if err != nil {
+					if verbose {
+						log.Printf("Warning: failed to remove file %s: %v", name, err)
+					}
+				} else {
+					removedFiles++
+
+					if verbose {
+						log.Printf("Removed file: %s", name)
+					}
+				}
+			}
+		}
+	}
+
+	if removedFiles > 0 || removedDirs > 0 {
+		const bytesPerMB = 1024 * 1024
+		log.Printf("Cleaned up %d files and %d directories (freed ~%.2f MB)",
+			removedFiles, removedDirs, float64(totalSize)/bytesPerMB)
+	}
+
+	return nil
+}
+
+// getDirSize calculates the total size of a directory.
+func getDirSize(path string) (int64, error) {
+	var size int64
+
+	err := filepath.Walk(path, func(_ string, info os.FileInfo, err error) error {
+		if err != nil {
+			return err
+		}
+
+		if !info.IsDir() {
+			size += info.Size()
+		}
+
+		return nil
+	})
+
+	return size, err
+}
--- a/cmd/hi/docker.go
+++ b/cmd/hi/docker.go
@@ -22,17 +22,22 @@ import (
 	"github.com/juanfont/headscale/integration/dockertestutil"
 )

+const defaultDirPerm = 0o755
+
 var (
 	ErrTestFailed              = errors.New("test failed")
 	ErrUnexpectedContainerWait = errors.New("unexpected end of container wait")
 	ErrNoDockerContext         = errors.New("no docker context found")
+	ErrMemoryLimitViolations   = errors.New("container(s) exceeded memory limits")
 )

 // runTestContainer executes integration tests in a Docker container.
+//
+//nolint:gocyclo // complex test orchestration function
 func runTestContainer(ctx context.Context, config *RunConfig) error {
-	cli, err := createDockerClient()
+	cli, err := createDockerClient(ctx)
 	if err != nil {
-		return fmt.Errorf("failed to create Docker client: %w", err)
+		return fmt.Errorf("creating Docker client: %w", err)
 	}
 	defer cli.Close()

@@ -48,19 +53,21 @@ func runTestContainer(ctx context.Context, config *RunConfig) error {

 	absLogsDir, err := filepath.Abs(logsDir)
 	if err != nil {
-		return fmt.Errorf("failed to get absolute path for logs directory: %w", err)
+		return fmt.Errorf("getting absolute path for logs directory: %w", err)
 	}

 	const dirPerm = 0o755
-	if err := os.MkdirAll(absLogsDir, dirPerm); err != nil {
-		return fmt.Errorf("failed to create logs directory: %w", err)
+	if err := os.MkdirAll(absLogsDir, dirPerm); err != nil { //nolint:noinlineerr
+		return fmt.Errorf("creating logs directory: %w", err)
 	}

 	if config.CleanBefore {
 		if config.Verbose {
 			log.Printf("Running pre-test cleanup...")
 		}
-		if err := cleanupBeforeTest(ctx); err != nil && config.Verbose {
+
+		err := cleanupBeforeTest(ctx)
+		if err != nil && config.Verbose {
 			log.Printf("Warning: pre-test cleanup failed: %v", err)
 		}
 	}
@@ -71,52 +78,118 @@ func runTestContainer(ctx context.Context, config *RunConfig) error {
 	}

 	imageName := "golang:" + config.GoVersion
-	if err := ensureImageAvailable(ctx, cli, imageName, config.Verbose); err != nil {
-		return fmt.Errorf("failed to ensure image availability: %w", err)
+	if err := ensureImageAvailable(ctx, cli, imageName, config.Verbose); err != nil { //nolint:noinlineerr
+		return fmt.Errorf("ensuring image availability: %w", err)
 	}

 	resp, err := createGoTestContainer(ctx, cli, config, containerName, absLogsDir, goTestCmd)
 	if err != nil {
-		return fmt.Errorf("failed to create container: %w", err)
+		return fmt.Errorf("creating container: %w", err)
 	}

 	if config.Verbose {
 		log.Printf("Created container: %s", resp.ID)
 	}

-	if err := cli.ContainerStart(ctx, resp.ID, container.StartOptions{}); err != nil {
-		return fmt.Errorf("failed to start container: %w", err)
+	if err := cli.ContainerStart(ctx, resp.ID, container.StartOptions{}); err != nil { //nolint:noinlineerr
+		return fmt.Errorf("starting container: %w", err)
 	}

 	log.Printf("Starting test: %s", config.TestPattern)
+	log.Printf("Run ID: %s", runID)
+	log.Printf("Monitor with: docker logs -f %s", containerName)
+	log.Printf("Logs directory: %s", logsDir)
+
+	// Start stats collection for container resource monitoring (if enabled)
+	var statsCollector *StatsCollector
+
+	if config.Stats {
+		var err error
+
+		statsCollector, err = NewStatsCollector(ctx)
+		if err != nil {
+			if config.Verbose {
+				log.Printf("Warning: failed to create stats collector: %v", err)
+			}
+
+			statsCollector = nil
+		}
+
+		if statsCollector != nil {
+			defer statsCollector.Close()
+
+			// Start stats collection immediately - no need for complex retry logic
+			// The new implementation monitors Docker events and will catch containers as they start
+			err := statsCollector.StartCollection(ctx, runID, config.Verbose)
+			if err != nil {
+				if config.Verbose {
+					log.Printf("Warning: failed to start stats collection: %v", err)
+				}
+			}
+			defer statsCollector.StopCollection()
+		}
+	}

 	exitCode, err := streamAndWait(ctx, cli, resp.ID)

 	// Ensure all containers have finished and logs are flushed before extracting artifacts
-	if waitErr := waitForContainerFinalization(ctx, cli, resp.ID, config.Verbose); waitErr != nil && config.Verbose {
+	waitErr := waitForContainerFinalization(ctx, cli, resp.ID, config.Verbose)
+	if waitErr != nil && config.Verbose {
 		log.Printf("Warning: failed to wait for container finalization: %v", waitErr)
 	}

 	// Extract artifacts from test containers before cleanup
-	if err := extractArtifactsFromContainers(ctx, resp.ID, logsDir, config.Verbose); err != nil && config.Verbose {
+	if err := extractArtifactsFromContainers(ctx, resp.ID, logsDir, config.Verbose); err != nil && config.Verbose { //nolint:noinlineerr
 		log.Printf("Warning: failed to extract artifacts from containers: %v", err)
 	}

 	// Always list control files regardless of test outcome
 	listControlFiles(logsDir)

+	// Print stats summary and check memory limits if enabled
+	if config.Stats && statsCollector != nil {
+		violations := statsCollector.PrintSummaryAndCheckLimits(config.HSMemoryLimit, config.TSMemoryLimit)
+		if len(violations) > 0 {
+			log.Printf("MEMORY LIMIT VIOLATIONS DETECTED:")
+			log.Printf("=================================")
+
+			for _, violation := range violations {
+				log.Printf("Container %s exceeded memory limit: %.1f MB > %.1f MB",
+					violation.ContainerName, violation.MaxMemoryMB, violation.LimitMB)
+			}
+
+			return fmt.Errorf("test failed: %d %w", len(violations), ErrMemoryLimitViolations)
+		}
+	}
+
 	shouldCleanup := config.CleanAfter && (!config.KeepOnFailure || exitCode == 0)
 	if shouldCleanup {
 		if config.Verbose {
-			log.Printf("Running post-test cleanup...")
+			log.Printf("Running post-test cleanup for run %s...", runID)
 		}
-		if cleanErr := cleanupAfterTest(ctx, cli, resp.ID); cleanErr != nil && config.Verbose {
+
+		cleanErr := cleanupAfterTest(ctx, cli, resp.ID, runID)
+
+		if cleanErr != nil && config.Verbose {
 			log.Printf("Warning: post-test cleanup failed: %v", cleanErr)
 		}
+
+		// Clean up artifacts from successful tests to save disk space in CI
+		if exitCode == 0 {
+			if config.Verbose {
+				log.Printf("Test succeeded, cleaning up artifacts to save disk space...")
+			}
+
+			cleanErr := cleanupSuccessfulTestArtifacts(logsDir, config.Verbose)
+
+			if cleanErr != nil && config.Verbose {
+				log.Printf("Warning: artifact cleanup failed: %v", cleanErr)
+			}
+		}
 	}

 	if err != nil {
-		return fmt.Errorf("test execution failed: %w", err)
+		return fmt.Errorf("executing test: %w", err)
 	}

 	if exitCode != 0 {
@@ -150,7 +223,7 @@ func buildGoTestCommand(config *RunConfig) []string {
 func createGoTestContainer(ctx context.Context, cli *client.Client, config *RunConfig, containerName, logsDir string, goTestCmd []string) (container.CreateResponse, error) {
 	pwd, err := os.Getwd()
 	if err != nil {
-		return container.CreateResponse{}, fmt.Errorf("failed to get working directory: %w", err)
+		return container.CreateResponse{}, fmt.Errorf("getting working directory: %w", err)
 	}

 	projectRoot := findProjectRoot(pwd)
@@ -161,6 +234,28 @@ func createGoTestContainer(ctx context.Context, cli *client.Client, config *RunC
 		fmt.Sprintf("HEADSCALE_INTEGRATION_POSTGRES=%d", boolToInt(config.UsePostgres)),
 		"HEADSCALE_INTEGRATION_RUN_ID=" + runID,
 	}
+
+	// Pass through CI environment variable for CI detection
+	if ci := os.Getenv("CI"); ci != "" {
+		env = append(env, "CI="+ci)
+	}
+
+	// Pass through all HEADSCALE_INTEGRATION_* environment variables
+	for _, e := range os.Environ() {
+		if strings.HasPrefix(e, "HEADSCALE_INTEGRATION_") {
+			// Skip the ones we already set explicitly
+			if strings.HasPrefix(e, "HEADSCALE_INTEGRATION_POSTGRES=") ||
+				strings.HasPrefix(e, "HEADSCALE_INTEGRATION_RUN_ID=") {
+				continue
+			}
+
+			env = append(env, e)
+		}
+	}
+
+	// Set GOCACHE to a known location (used by both bind mount and volume cases)
+	env = append(env, "GOCACHE=/cache/go-build")
+
 	containerConfig := &container.Config{
 		Image:      "golang:" + config.GoVersion,
 		Cmd:        goTestCmd,
@@ -180,20 +275,43 @@ func createGoTestContainer(ctx context.Context, cli *client.Client, config *RunC
 		log.Printf("Using Docker socket: %s", dockerSocketPath)
 	}

+	binds := []string{
+		fmt.Sprintf("%s:%s", projectRoot, projectRoot),
+		dockerSocketPath + ":/var/run/docker.sock",
+		logsDir + ":/tmp/control",
+	}
+
+	// Use bind mounts for Go cache if provided via environment variables,
+	// otherwise fall back to Docker volumes for local development
+	var mounts []mount.Mount
+
+	goCache := os.Getenv("HEADSCALE_INTEGRATION_GO_CACHE")
+	goBuildCache := os.Getenv("HEADSCALE_INTEGRATION_GO_BUILD_CACHE")
+
+	if goCache != "" {
+		binds = append(binds, goCache+":/go")
+	} else {
+		mounts = append(mounts, mount.Mount{
+			Type:   mount.TypeVolume,
+			Source: "hs-integration-go-cache",
+			Target: "/go",
+		})
+	}
+
+	if goBuildCache != "" {
+		binds = append(binds, goBuildCache+":/cache/go-build")
+	} else {
+		mounts = append(mounts, mount.Mount{
+			Type:   mount.TypeVolume,
+			Source: "hs-integration-go-build-cache",
+			Target: "/cache/go-build",
+		})
+	}
+
 	hostConfig := &container.HostConfig{
 		AutoRemove: false, // We'll remove manually for better control
-		Binds: []string{
-			fmt.Sprintf("%s:%s", projectRoot, projectRoot),
-			dockerSocketPath + ":/var/run/docker.sock",
-			logsDir + ":/tmp/control",
-		},
-		Mounts: []mount.Mount{
-			{
-				Type:   mount.TypeVolume,
-				Source: "hs-integration-go-cache",
-				Target: "/go",
-			},
-		},
+		Binds:      binds,
+		Mounts:     mounts,
 	}

 	return cli.ContainerCreate(ctx, containerConfig, hostConfig, nil, nil, containerName)
@@ -207,7 +325,7 @@ func streamAndWait(ctx context.Context, cli *client.Client, containerID string)
 		Follow:     true,
 	})
 	if err != nil {
-		return -1, fmt.Errorf("failed to get container logs: %w", err)
+		return -1, fmt.Errorf("getting container logs: %w", err)
 	}
 	defer out.Close()

@@ -219,7 +337,7 @@ func streamAndWait(ctx context.Context, cli *client.Client, containerID string)
 	select {
 	case err := <-errCh:
 		if err != nil {
-			return -1, fmt.Errorf("error waiting for container: %w", err)
+			return -1, fmt.Errorf("waiting for container: %w", err)
 		}
 	case status := <-statusCh:
 		return int(status.StatusCode), nil
@@ -233,7 +351,7 @@ func waitForContainerFinalization(ctx context.Context, cli *client.Client, testC
 	// First, get all related test containers
 	containers, err := cli.ContainerList(ctx, container.ListOptions{All: true})
 	if err != nil {
-		return fmt.Errorf("failed to list containers: %w", err)
+		return fmt.Errorf("listing containers: %w", err)
 	}

 	testContainers := getCurrentTestContainers(containers, testContainerID, verbose)
@@ -242,6 +360,7 @@ func waitForContainerFinalization(ctx context.Context, cli *client.Client, testC
 	maxWaitTime := 10 * time.Second
 	checkInterval := 500 * time.Millisecond
 	timeout := time.After(maxWaitTime)
+
 	ticker := time.NewTicker(checkInterval)
 	defer ticker.Stop()

@@ -251,6 +370,7 @@ func waitForContainerFinalization(ctx context.Context, cli *client.Client, testC
 			if verbose {
 				log.Printf("Timeout waiting for container finalization, proceeding with artifact extraction")
 			}
+
 			return nil
 		case <-ticker.C:
 			allFinalized := true
@@ -261,12 +381,14 @@ func waitForContainerFinalization(ctx context.Context, cli *client.Client, testC
 					if verbose {
 						log.Printf("Warning: failed to inspect container %s: %v", testCont.name, err)
 					}
+
 					continue
 				}

 				// Check if container is in a final state
 				if !isContainerFinalized(inspect.State) {
 					allFinalized = false
+
 					if verbose {
 						log.Printf("Container %s still finalizing (state: %s)", testCont.name, inspect.State.Status)
 					}
@@ -279,6 +401,7 @@ func waitForContainerFinalization(ctx context.Context, cli *client.Client, testC
 				if verbose {
 					log.Printf("All test containers finalized, ready for artifact extraction")
 				}
+
 				return nil
 			}
 		}
@@ -295,13 +418,15 @@ func isContainerFinalized(state *container.State) bool {
 func findProjectRoot(startPath string) string {
 	current := startPath
 	for {
-		if _, err := os.Stat(filepath.Join(current, "go.mod")); err == nil {
+		if _, err := os.Stat(filepath.Join(current, "go.mod")); err == nil { //nolint:noinlineerr
 			return current
 		}
+
 		parent := filepath.Dir(current)
 		if parent == current {
 			return startPath
 		}
+
 		current = parent
 	}
 }
@@ -311,34 +436,37 @@ func boolToInt(b bool) int {
 	if b {
 		return 1
 	}
+
 	return 0
 }

 // DockerContext represents Docker context information.
 type DockerContext struct {
-	Name      string                 `json:"Name"`
-	Metadata  map[string]interface{} `json:"Metadata"`
-	Endpoints map[string]interface{} `json:"Endpoints"`
-	Current   bool                   `json:"Current"`
+	Name      string         `json:"Name"`
+	Metadata  map[string]any `json:"Metadata"`
+	Endpoints map[string]any `json:"Endpoints"`
+	Current   bool           `json:"Current"`
 }

 // createDockerClient creates a Docker client with context detection.
-func createDockerClient() (*client.Client, error) {
-	contextInfo, err := getCurrentDockerContext()
+func createDockerClient(ctx context.Context) (*client.Client, error) {
+	contextInfo, err := getCurrentDockerContext(ctx)
 	if err != nil {
 		return client.NewClientWithOpts(client.FromEnv, client.WithAPIVersionNegotiation())
 	}

 	var clientOpts []client.Opt
+
 	clientOpts = append(clientOpts, client.WithAPIVersionNegotiation())

 	if contextInfo != nil {
 		if endpoints, ok := contextInfo.Endpoints["docker"]; ok {
-			if endpointMap, ok := endpoints.(map[string]interface{}); ok {
+			if endpointMap, ok := endpoints.(map[string]any); ok {
 				if host, ok := endpointMap["Host"].(string); ok {
 					if runConfig.Verbose {
 						log.Printf("Using Docker host from context '%s': %s", contextInfo.Name, host)
 					}
+
 					clientOpts = append(clientOpts, client.WithHost(host))
 				}
 			}
@@ -353,16 +481,17 @@ func createDockerClient() (*client.Client, error) {
 }

 // getCurrentDockerContext retrieves the current Docker context information.
-func getCurrentDockerContext() (*DockerContext, error) {
-	cmd := exec.Command("docker", "context", "inspect")
+func getCurrentDockerContext(ctx context.Context) (*DockerContext, error) {
+	cmd := exec.CommandContext(ctx, "docker", "context", "inspect")
+
 	output, err := cmd.Output()
 	if err != nil {
-		return nil, fmt.Errorf("failed to get docker context: %w", err)
+		return nil, fmt.Errorf("getting docker context: %w", err)
 	}

 	var contexts []DockerContext
-	if err := json.Unmarshal(output, &contexts); err != nil {
-		return nil, fmt.Errorf("failed to parse docker context: %w", err)
+	if err := json.Unmarshal(output, &contexts); err != nil { //nolint:noinlineerr
+		return nil, fmt.Errorf("parsing docker context: %w", err)
 	}

 	if len(contexts) > 0 {
@@ -379,28 +508,58 @@ func getDockerSocketPath() string {
 	return "/var/run/docker.sock"
 }

-// ensureImageAvailable pulls the specified Docker image to ensure it's available.
+// checkImageAvailableLocally checks if the specified Docker image is available locally.
+func checkImageAvailableLocally(ctx context.Context, cli *client.Client, imageName string) (bool, error) {
+	_, _, err := cli.ImageInspectWithRaw(ctx, imageName) //nolint:staticcheck // SA1019: deprecated but functional
+	if err != nil {
+		if client.IsErrNotFound(err) { //nolint:staticcheck // SA1019: deprecated but functional
+			return false, nil
+		}
+
+		return false, fmt.Errorf("inspecting image %s: %w", imageName, err)
+	}
+
+	return true, nil
+}
+
+// ensureImageAvailable checks if the image is available locally first, then pulls if needed.
 func ensureImageAvailable(ctx context.Context, cli *client.Client, imageName string, verbose bool) error {
+	// First check if image is available locally
+	available, err := checkImageAvailableLocally(ctx, cli, imageName)
+	if err != nil {
+		return fmt.Errorf("checking local image availability: %w", err)
+	}
+
+	if available {
+		if verbose {
+			log.Printf("Image %s is available locally", imageName)
+		}
+
+		return nil
+	}
+
+	// Image not available locally, try to pull it
 	if verbose {
-		log.Printf("Pulling image %s...", imageName)
+		log.Printf("Image %s not found locally, pulling...", imageName)
 	}

 	reader, err := cli.ImagePull(ctx, imageName, image.PullOptions{})
 	if err != nil {
-		return fmt.Errorf("failed to pull image %s: %w", imageName, err)
+		return fmt.Errorf("pulling image %s: %w", imageName, err)
 	}
 	defer reader.Close()

 	if verbose {
 		_, err = io.Copy(os.Stdout, reader)
 		if err != nil {
-			return fmt.Errorf("failed to read pull output: %w", err)
+			return fmt.Errorf("reading pull output: %w", err)
 		}
 	} else {
 		_, err = io.Copy(io.Discard, reader)
 		if err != nil {
-			return fmt.Errorf("failed to read pull output: %w", err)
+			return fmt.Errorf("reading pull output: %w", err)
 		}
+
 		log.Printf("Image %s pulled successfully", imageName)
 	}

@@ -415,9 +574,11 @@ func listControlFiles(logsDir string) {
 		return
 	}

-	var logFiles []string
-	var dataFiles []string
-	var dataDirs []string
+	var (
+		logFiles  []string
+		dataFiles []string
+		dataDirs  []string
+	)

 	for _, entry := range entries {
 		name := entry.Name()
@@ -446,6 +607,7 @@ func listControlFiles(logsDir string) {

 	if len(logFiles) > 0 {
 		log.Printf("Headscale logs:")
+
 		for _, file := range logFiles {
 			log.Printf("  %s", file)
 		}
@@ -453,9 +615,11 @@ func listControlFiles(logsDir string) {

 	if len(dataFiles) > 0 || len(dataDirs) > 0 {
 		log.Printf("Headscale data:")
+
 		for _, file := range dataFiles {
 			log.Printf("  %s", file)
 		}
+
 		for _, dir := range dataDirs {
 			log.Printf("  %s/", dir)
 		}
@@ -464,25 +628,27 @@ func listControlFiles(logsDir string) {

 // extractArtifactsFromContainers collects container logs and files from the specific test run.
 func extractArtifactsFromContainers(ctx context.Context, testContainerID, logsDir string, verbose bool) error {
-	cli, err := createDockerClient()
+	cli, err := createDockerClient(ctx)
 	if err != nil {
-		return fmt.Errorf("failed to create Docker client: %w", err)
+		return fmt.Errorf("creating Docker client: %w", err)
 	}
 	defer cli.Close()

 	// List all containers
 	containers, err := cli.ContainerList(ctx, container.ListOptions{All: true})
 	if err != nil {
-		return fmt.Errorf("failed to list containers: %w", err)
+		return fmt.Errorf("listing containers: %w", err)
 	}

 	// Get containers from the specific test run
 	currentTestContainers := getCurrentTestContainers(containers, testContainerID, verbose)

 	extractedCount := 0
+
 	for _, cont := range currentTestContainers {
 		// Extract container logs and tar files
-		if err := extractContainerArtifacts(ctx, cli, cont.ID, cont.name, logsDir, verbose); err != nil {
+		err := extractContainerArtifacts(ctx, cli, cont.ID, cont.name, logsDir, verbose)
+		if err != nil {
 			if verbose {
 				log.Printf("Warning: failed to extract artifacts from container %s (%s): %v", cont.name, cont.ID[:12], err)
 			}
@@ -490,6 +656,7 @@ func extractArtifactsFromContainers(ctx context.Context, testContainerID, logsDi
 			if verbose {
 				log.Printf("Extracted artifacts from container %s (%s)", cont.name, cont.ID[:12])
 			}
+
 			extractedCount++
 		}
 	}
@@ -513,11 +680,13 @@ func getCurrentTestContainers(containers []container.Summary, testContainerID st

 	// Find the test container to get its run ID label
 	var runID string
+
 	for _, cont := range containers {
 		if cont.ID == testContainerID {
 			if cont.Labels != nil {
 				runID = cont.Labels["hi.run-id"]
 			}
+
 			break
 		}
 	}
@@ -558,18 +727,21 @@ func getCurrentTestContainers(containers []container.Summary, testContainerID st
 // extractContainerArtifacts saves logs and tar files from a container.
 func extractContainerArtifacts(ctx context.Context, cli *client.Client, containerID, containerName, logsDir string, verbose bool) error {
 	// Ensure the logs directory exists
-	if err := os.MkdirAll(logsDir, 0o755); err != nil {
-		return fmt.Errorf("failed to create logs directory: %w", err)
+	err := os.MkdirAll(logsDir, defaultDirPerm)
+	if err != nil {
+		return fmt.Errorf("creating logs directory: %w", err)
 	}

 	// Extract container logs
-	if err := extractContainerLogs(ctx, cli, containerID, containerName, logsDir, verbose); err != nil {
-		return fmt.Errorf("failed to extract logs: %w", err)
+	err = extractContainerLogs(ctx, cli, containerID, containerName, logsDir, verbose)
+	if err != nil {
+		return fmt.Errorf("extracting logs: %w", err)
 	}

 	// Extract tar files for headscale containers only
 	if strings.HasPrefix(containerName, "hs-") {
-		if err := extractContainerFiles(ctx, cli, containerID, containerName, logsDir, verbose); err != nil {
+		err := extractContainerFiles(ctx, cli, containerID, containerName, logsDir, verbose)
+		if err != nil {
 			if verbose {
 				log.Printf("Warning: failed to extract files from %s: %v", containerName, err)
 			}
@@ -591,7 +763,7 @@ func extractContainerLogs(ctx context.Context, cli *client.Client, containerID,
 		Tail:       "all",
 	})
 	if err != nil {
-		return fmt.Errorf("failed to get container logs: %w", err)
+		return fmt.Errorf("getting container logs: %w", err)
 	}
 	defer logReader.Close()

@@ -605,17 +777,17 @@ func extractContainerLogs(ctx context.Context, cli *client.Client, containerID,
 	// Demultiplex the Docker logs stream to separate stdout and stderr
 	_, err = stdcopy.StdCopy(&stdoutBuf, &stderrBuf, logReader)
 	if err != nil {
-		return fmt.Errorf("failed to demultiplex container logs: %w", err)
+		return fmt.Errorf("demultiplexing container logs: %w", err)
 	}

 	// Write stdout logs
-	if err := os.WriteFile(stdoutPath, stdoutBuf.Bytes(), 0o644); err != nil {
-		return fmt.Errorf("failed to write stdout log: %w", err)
+	if err := os.WriteFile(stdoutPath, stdoutBuf.Bytes(), 0o644); err != nil { //nolint:gosec,noinlineerr // log files should be readable
+		return fmt.Errorf("writing stdout log: %w", err)
 	}

 	// Write stderr logs
-	if err := os.WriteFile(stderrPath, stderrBuf.Bytes(), 0o644); err != nil {
-		return fmt.Errorf("failed to write stderr log: %w", err)
+	if err := os.WriteFile(stderrPath, stderrBuf.Bytes(), 0o644); err != nil { //nolint:gosec,noinlineerr // log files should be readable
+		return fmt.Errorf("writing stderr log: %w", err)
 	}

 	if verbose {
@@ -633,63 +805,3 @@ func extractContainerFiles(ctx context.Context, cli *client.Client, containerID,
 	// This function is kept for potential future use or other file types
 	return nil
 }
-
-// logExtractionError logs extraction errors with appropriate level based on error type.
-func logExtractionError(artifactType, containerName string, err error, verbose bool) {
-	if errors.Is(err, ErrFileNotFoundInTar) {
-		// File not found is expected and only logged in verbose mode
-		if verbose {
-			log.Printf("No %s found in container %s", artifactType, containerName)
-		}
-	} else {
-		// Other errors are actual failures and should be logged as warnings
-		log.Printf("Warning: failed to extract %s from %s: %v", artifactType, containerName, err)
-	}
-}
-
-// extractSingleFile copies a single file from a container.
-func extractSingleFile(ctx context.Context, cli *client.Client, containerID, sourcePath, fileName, logsDir string, verbose bool) error {
-	tarReader, _, err := cli.CopyFromContainer(ctx, containerID, sourcePath)
-	if err != nil {
-		return fmt.Errorf("failed to copy %s from container: %w", sourcePath, err)
-	}
-	defer tarReader.Close()
-
-	// Extract the single file from the tar
-	filePath := filepath.Join(logsDir, fileName)
-	if err := extractFileFromTar(tarReader, filepath.Base(sourcePath), filePath); err != nil {
-		return fmt.Errorf("failed to extract file from tar: %w", err)
-	}
-
-	if verbose {
-		log.Printf("Extracted %s from %s", fileName, containerID[:12])
-	}
-
-	return nil
-}
-
-// extractDirectory copies a directory from a container and extracts its contents.
-func extractDirectory(ctx context.Context, cli *client.Client, containerID, sourcePath, dirName, logsDir string, verbose bool) error {
-	tarReader, _, err := cli.CopyFromContainer(ctx, containerID, sourcePath)
-	if err != nil {
-		return fmt.Errorf("failed to copy %s from container: %w", sourcePath, err)
-	}
-	defer tarReader.Close()
-
-	// Create target directory
-	targetDir := filepath.Join(logsDir, dirName)
-	if err := os.MkdirAll(targetDir, 0o755); err != nil {
-		return fmt.Errorf("failed to create directory %s: %w", targetDir, err)
-	}
-
-	// Extract the directory from the tar
-	if err := extractDirectoryFromTar(tarReader, targetDir); err != nil {
-		return fmt.Errorf("failed to extract directory from tar: %w", err)
-	}
-
-	if verbose {
-		log.Printf("Extracted %s/ from %s", dirName, containerID[:12])
-	}
-
-	return nil
-}
--- a/cmd/hi/doctor.go
+++ b/cmd/hi/doctor.go
@@ -38,13 +38,13 @@ func runDoctorCheck(ctx context.Context) error {
 	}

 	// Check 3: Go installation
-	results = append(results, checkGoInstallation())
+	results = append(results, checkGoInstallation(ctx))

 	// Check 4: Git repository
-	results = append(results, checkGitRepository())
+	results = append(results, checkGitRepository(ctx))

 	// Check 5: Required files
-	results = append(results, checkRequiredFiles())
+	results = append(results, checkRequiredFiles(ctx))

 	// Display results
 	displayDoctorResults(results)
@@ -86,7 +86,7 @@ func checkDockerBinary() DoctorResult {

 // checkDockerDaemon verifies Docker daemon is running and accessible.
 func checkDockerDaemon(ctx context.Context) DoctorResult {
-	cli, err := createDockerClient()
+	cli, err := createDockerClient(ctx)
 	if err != nil {
 		return DoctorResult{
 			Name:    "Docker Daemon",
@@ -124,8 +124,8 @@ func checkDockerDaemon(ctx context.Context) DoctorResult {
 }

 // checkDockerContext verifies Docker context configuration.
-func checkDockerContext(_ context.Context) DoctorResult {
-	contextInfo, err := getCurrentDockerContext()
+func checkDockerContext(ctx context.Context) DoctorResult {
+	contextInfo, err := getCurrentDockerContext(ctx)
 	if err != nil {
 		return DoctorResult{
 			Name:    "Docker Context",
@@ -155,7 +155,7 @@ func checkDockerContext(_ context.Context) DoctorResult {

 // checkDockerSocket verifies Docker socket accessibility.
 func checkDockerSocket(ctx context.Context) DoctorResult {
-	cli, err := createDockerClient()
+	cli, err := createDockerClient(ctx)
 	if err != nil {
 		return DoctorResult{
 			Name:    "Docker Socket",
@@ -190,9 +190,9 @@ func checkDockerSocket(ctx context.Context) DoctorResult {
 	}
 }

-// checkGolangImage verifies we can access the golang Docker image.
+// checkGolangImage verifies the golang Docker image is available locally or can be pulled.
 func checkGolangImage(ctx context.Context) DoctorResult {
-	cli, err := createDockerClient()
+	cli, err := createDockerClient(ctx)
 	if err != nil {
 		return DoctorResult{
 			Name:    "Golang Image",
@@ -205,17 +205,40 @@ func checkGolangImage(ctx context.Context) DoctorResult {
 	goVersion := detectGoVersion()
 	imageName := "golang:" + goVersion

-	// Check if we can pull the image
+	// First check if image is available locally
+	available, err := checkImageAvailableLocally(ctx, cli, imageName)
+	if err != nil {
+		return DoctorResult{
+			Name:    "Golang Image",
+			Status:  "FAIL",
+			Message: fmt.Sprintf("Cannot check golang image %s: %v", imageName, err),
+			Suggestions: []string{
+				"Check Docker daemon status",
+				"Try: docker images | grep golang",
+			},
+		}
+	}
+
+	if available {
+		return DoctorResult{
+			Name:    "Golang Image",
+			Status:  "PASS",
+			Message: fmt.Sprintf("Golang image %s is available locally", imageName),
+		}
+	}
+
+	// Image not available locally, try to pull it
 	err = ensureImageAvailable(ctx, cli, imageName, false)
 	if err != nil {
 		return DoctorResult{
 			Name:    "Golang Image",
 			Status:  "FAIL",
-			Message: fmt.Sprintf("Cannot pull golang image %s: %v", imageName, err),
+			Message: fmt.Sprintf("Golang image %s not available locally and cannot pull: %v", imageName, err),
 			Suggestions: []string{
 				"Check internet connectivity",
 				"Verify Docker Hub access",
 				"Try: docker pull " + imageName,
+				"Or run tests offline if image was pulled previously",
 			},
 		}
 	}
@@ -223,12 +246,12 @@ func checkGolangImage(ctx context.Context) DoctorResult {
 	return DoctorResult{
 		Name:    "Golang Image",
 		Status:  "PASS",
-		Message: fmt.Sprintf("Golang image %s is available", imageName),
+		Message: fmt.Sprintf("Golang image %s is now available", imageName),
 	}
 }

 // checkGoInstallation verifies Go is installed and working.
-func checkGoInstallation() DoctorResult {
+func checkGoInstallation(ctx context.Context) DoctorResult {
 	_, err := exec.LookPath("go")
 	if err != nil {
 		return DoctorResult{
@@ -242,7 +265,8 @@ func checkGoInstallation() DoctorResult {
 		}
 	}

-	cmd := exec.Command("go", "version")
+	cmd := exec.CommandContext(ctx, "go", "version")
+
 	output, err := cmd.Output()
 	if err != nil {
 		return DoctorResult{
@@ -262,8 +286,9 @@ func checkGoInstallation() DoctorResult {
 }

 // checkGitRepository verifies we're in a git repository.
-func checkGitRepository() DoctorResult {
-	cmd := exec.Command("git", "rev-parse", "--git-dir")
+func checkGitRepository(ctx context.Context) DoctorResult {
+	cmd := exec.CommandContext(ctx, "git", "rev-parse", "--git-dir")
+
 	err := cmd.Run()
 	if err != nil {
 		return DoctorResult{
@@ -285,7 +310,7 @@ func checkGitRepository() DoctorResult {
 }

 // checkRequiredFiles verifies required files exist.
-func checkRequiredFiles() DoctorResult {
+func checkRequiredFiles(ctx context.Context) DoctorResult {
 	requiredFiles := []string{
 		"go.mod",
 		"integration/",
@@ -293,9 +318,12 @@ func checkRequiredFiles() DoctorResult {
 	}

 	var missingFiles []string
+
 	for _, file := range requiredFiles {
-		cmd := exec.Command("test", "-e", file)
-		if err := cmd.Run(); err != nil {
+		cmd := exec.CommandContext(ctx, "test", "-e", file)
+
+		err := cmd.Run()
+		if err != nil {
 			missingFiles = append(missingFiles, file)
 		}
 	}
@@ -327,6 +355,7 @@ func displayDoctorResults(results []DoctorResult) {

 	for _, result := range results {
 		var icon string
+
 		switch result.Status {
 		case "PASS":
 			icon = "✅"
--- a/cmd/hi/main.go
+++ b/cmd/hi/main.go
@@ -79,13 +79,18 @@ func main() {
 }

 func cleanAll(ctx context.Context) error {
-	if err := killTestContainers(ctx); err != nil {
+	err := killTestContainers(ctx)
+	if err != nil {
 		return err
 	}
-	if err := pruneDockerNetworks(ctx); err != nil {
+
+	err = pruneDockerNetworks(ctx)
+	if err != nil {
 		return err
 	}
-	if err := cleanOldImages(ctx); err != nil {
+
+	err = cleanOldImages(ctx)
+	if err != nil {
 		return err
 	}

--- a/cmd/hi/run.go
+++ b/cmd/hi/run.go
@@ -19,11 +19,14 @@ type RunConfig struct {
 	FailFast      bool          `flag:"failfast,default=true,Stop on first test failure"`
 	UsePostgres   bool          `flag:"postgres,default=false,Use PostgreSQL instead of SQLite"`
 	GoVersion     string        `flag:"go-version,Go version to use (auto-detected from go.mod)"`
-	CleanBefore   bool          `flag:"clean-before,default=true,Clean resources before test"`
+	CleanBefore   bool          `flag:"clean-before,default=true,Clean stale resources before test"`
 	CleanAfter    bool          `flag:"clean-after,default=true,Clean resources after test"`
 	KeepOnFailure bool          `flag:"keep-on-failure,default=false,Keep containers on test failure"`
 	LogsDir       string        `flag:"logs-dir,default=control_logs,Control logs directory"`
 	Verbose       bool          `flag:"verbose,default=false,Verbose output"`
+	Stats         bool          `flag:"stats,default=false,Collect and display container resource usage statistics"`
+	HSMemoryLimit float64       `flag:"hs-memory-limit,default=0,Fail test if any Headscale container exceeds this memory limit in MB (0 = disabled)"`
+	TSMemoryLimit float64       `flag:"ts-memory-limit,default=0,Fail test if any Tailscale container exceeds this memory limit in MB (0 = disabled)"`
 }

 // runIntegrationTest executes the integration test workflow.
@@ -45,7 +48,9 @@ func runIntegrationTest(env *command.Env) error {
 	if runConfig.Verbose {
 		log.Printf("Running pre-flight system checks...")
 	}
-	if err := runDoctorCheck(env.Context()); err != nil {
+
+	err := runDoctorCheck(env.Context())
+	if err != nil {
 		return fmt.Errorf("pre-flight checks failed: %w", err)
 	}

@@ -63,15 +68,15 @@ func runIntegrationTest(env *command.Env) error {
 func detectGoVersion() string {
 	goModPath := filepath.Join("..", "..", "go.mod")

-	if _, err := os.Stat("go.mod"); err == nil {
+	if _, err := os.Stat("go.mod"); err == nil { //nolint:noinlineerr
 		goModPath = "go.mod"
-	} else if _, err := os.Stat("../../go.mod"); err == nil {
+	} else if _, err := os.Stat("../../go.mod"); err == nil { //nolint:noinlineerr
 		goModPath = "../../go.mod"
 	}

 	content, err := os.ReadFile(goModPath)
 	if err != nil {
-		return "1.24"
+		return "1.26.1"
 	}

 	lines := splitLines(string(content))
@@ -86,13 +91,15 @@ func detectGoVersion() string {
 		}
 	}

-	return "1.24"
+	return "1.26.1"
 }

 // splitLines splits a string into lines without using strings.Split.
 func splitLines(s string) []string {
-	var lines []string
-	var current string
+	var (
+		lines   []string
+		current string
+	)

 	for _, char := range s {
 		if char == '\n' {
--- a/cmd/hi/stats.go
+++ b/cmd/hi/stats.go
@@ -0,0 +1,493 @@
+package main
+
+import (
+	"context"
+	"encoding/json"
+	"errors"
+	"fmt"
+	"log"
+	"sort"
+	"strings"
+	"sync"
+	"time"
+
+	"github.com/docker/docker/api/types"
+	"github.com/docker/docker/api/types/container"
+	"github.com/docker/docker/api/types/events"
+	"github.com/docker/docker/api/types/filters"
+	"github.com/docker/docker/client"
+)
+
+// ErrStatsCollectionAlreadyStarted is returned when trying to start stats collection that is already running.
+var ErrStatsCollectionAlreadyStarted = errors.New("stats collection already started")
+
+// ContainerStats represents statistics for a single container.
+type ContainerStats struct {
+	ContainerID   string
+	ContainerName string
+	Stats         []StatsSample
+	mutex         sync.RWMutex
+}
+
+// StatsSample represents a single stats measurement.
+type StatsSample struct {
+	Timestamp time.Time
+	CPUUsage  float64 // CPU usage percentage
+	MemoryMB  float64 // Memory usage in MB
+}
+
+// StatsCollector manages collection of container statistics.
+type StatsCollector struct {
+	client            *client.Client
+	containers        map[string]*ContainerStats
+	stopChan          chan struct{}
+	wg                sync.WaitGroup
+	mutex             sync.RWMutex
+	collectionStarted bool
+}
+
+// NewStatsCollector creates a new stats collector instance.
+func NewStatsCollector(ctx context.Context) (*StatsCollector, error) {
+	cli, err := createDockerClient(ctx)
+	if err != nil {
+		return nil, fmt.Errorf("creating Docker client: %w", err)
+	}
+
+	return &StatsCollector{
+		client:     cli,
+		containers: make(map[string]*ContainerStats),
+		stopChan:   make(chan struct{}),
+	}, nil
+}
+
+// StartCollection begins monitoring all containers and collecting stats for hs- and ts- containers with matching run ID.
+func (sc *StatsCollector) StartCollection(ctx context.Context, runID string, verbose bool) error {
+	sc.mutex.Lock()
+	defer sc.mutex.Unlock()
+
+	if sc.collectionStarted {
+		return ErrStatsCollectionAlreadyStarted
+	}
+
+	sc.collectionStarted = true
+
+	// Start monitoring existing containers
+	sc.wg.Add(1)
+
+	go sc.monitorExistingContainers(ctx, runID, verbose)
+
+	// Start Docker events monitoring for new containers
+	sc.wg.Add(1)
+
+	go sc.monitorDockerEvents(ctx, runID, verbose)
+
+	if verbose {
+		log.Printf("Started container monitoring for run ID %s", runID)
+	}
+
+	return nil
+}
+
+// StopCollection stops all stats collection.
+func (sc *StatsCollector) StopCollection() {
+	// Check if already stopped without holding lock
+	sc.mutex.RLock()
+
+	if !sc.collectionStarted {
+		sc.mutex.RUnlock()
+		return
+	}
+
+	sc.mutex.RUnlock()
+
+	// Signal stop to all goroutines
+	close(sc.stopChan)
+
+	// Wait for all goroutines to finish
+	sc.wg.Wait()
+
+	// Mark as stopped
+	sc.mutex.Lock()
+	sc.collectionStarted = false
+	sc.mutex.Unlock()
+}
+
+// monitorExistingContainers checks for existing containers that match our criteria.
+func (sc *StatsCollector) monitorExistingContainers(ctx context.Context, runID string, verbose bool) {
+	defer sc.wg.Done()
+
+	containers, err := sc.client.ContainerList(ctx, container.ListOptions{})
+	if err != nil {
+		if verbose {
+			log.Printf("Failed to list existing containers: %v", err)
+		}
+
+		return
+	}
+
+	for _, cont := range containers {
+		if sc.shouldMonitorContainer(cont, runID) {
+			sc.startStatsForContainer(ctx, cont.ID, cont.Names[0], verbose)
+		}
+	}
+}
+
+// monitorDockerEvents listens for container start events and begins monitoring relevant containers.
+func (sc *StatsCollector) monitorDockerEvents(ctx context.Context, runID string, verbose bool) {
+	defer sc.wg.Done()
+
+	filter := filters.NewArgs()
+	filter.Add("type", "container")
+	filter.Add("event", "start")
+
+	eventOptions := events.ListOptions{
+		Filters: filter,
+	}
+
+	events, errs := sc.client.Events(ctx, eventOptions)
+
+	for {
+		select {
+		case <-sc.stopChan:
+			return
+		case <-ctx.Done():
+			return
+		case event := <-events:
+			if event.Type == "container" && event.Action == "start" {
+				// Get container details
+				containerInfo, err := sc.client.ContainerInspect(ctx, event.ID) //nolint:staticcheck // SA1019: use Actor.ID
+				if err != nil {
+					continue
+				}
+
+				// Convert to types.Container format for consistency
+				cont := types.Container{ //nolint:staticcheck // SA1019: use container.Summary
+					ID:     containerInfo.ID,
+					Names:  []string{containerInfo.Name},
+					Labels: containerInfo.Config.Labels,
+				}
+
+				if sc.shouldMonitorContainer(cont, runID) {
+					sc.startStatsForContainer(ctx, cont.ID, cont.Names[0], verbose)
+				}
+			}
+		case err := <-errs:
+			if verbose {
+				log.Printf("Error in Docker events stream: %v", err)
+			}
+
+			return
+		}
+	}
+}
+
+// shouldMonitorContainer determines if a container should be monitored.
+func (sc *StatsCollector) shouldMonitorContainer(cont types.Container, runID string) bool { //nolint:staticcheck // SA1019: use container.Summary
+	// Check if it has the correct run ID label
+	if cont.Labels == nil || cont.Labels["hi.run-id"] != runID {
+		return false
+	}
+
+	// Check if it's an hs- or ts- container
+	for _, name := range cont.Names {
+		containerName := strings.TrimPrefix(name, "/")
+		if strings.HasPrefix(containerName, "hs-") || strings.HasPrefix(containerName, "ts-") {
+			return true
+		}
+	}
+
+	return false
+}
+
+// startStatsForContainer begins stats collection for a specific container.
+func (sc *StatsCollector) startStatsForContainer(ctx context.Context, containerID, containerName string, verbose bool) {
+	containerName = strings.TrimPrefix(containerName, "/")
+
+	sc.mutex.Lock()
+	// Check if we're already monitoring this container
+	if _, exists := sc.containers[containerID]; exists {
+		sc.mutex.Unlock()
+		return
+	}
+
+	sc.containers[containerID] = &ContainerStats{
+		ContainerID:   containerID,
+		ContainerName: containerName,
+		Stats:         make([]StatsSample, 0),
+	}
+	sc.mutex.Unlock()
+
+	if verbose {
+		log.Printf("Starting stats collection for container %s (%s)", containerName, containerID[:12])
+	}
+
+	sc.wg.Add(1)
+
+	go sc.collectStatsForContainer(ctx, containerID, verbose)
+}
+
+// collectStatsForContainer collects stats for a specific container using Docker API streaming.
+func (sc *StatsCollector) collectStatsForContainer(ctx context.Context, containerID string, verbose bool) {
+	defer sc.wg.Done()
+
+	// Use Docker API streaming stats - much more efficient than CLI
+	statsResponse, err := sc.client.ContainerStats(ctx, containerID, true)
+	if err != nil {
+		if verbose {
+			log.Printf("Failed to get stats stream for container %s: %v", containerID[:12], err)
+		}
+
+		return
+	}
+	defer statsResponse.Body.Close()
+
+	decoder := json.NewDecoder(statsResponse.Body)
+
+	var prevStats *container.Stats //nolint:staticcheck // SA1019: use StatsResponse
+
+	for {
+		select {
+		case <-sc.stopChan:
+			return
+		case <-ctx.Done():
+			return
+		default:
+			var stats container.Stats //nolint:staticcheck // SA1019: use StatsResponse
+
+			err := decoder.Decode(&stats)
+			if err != nil {
+				// EOF is expected when container stops or stream ends
+				if err.Error() != "EOF" && verbose {
+					log.Printf("Failed to decode stats for container %s: %v", containerID[:12], err)
+				}
+
+				return
+			}
+
+			// Calculate CPU percentage (only if we have previous stats)
+			var cpuPercent float64
+			if prevStats != nil {
+				cpuPercent = calculateCPUPercent(prevStats, &stats)
+			}
+
+			// Calculate memory usage in MB
+			memoryMB := float64(stats.MemoryStats.Usage) / (1024 * 1024)
+
+			// Store the sample (skip first sample since CPU calculation needs previous stats)
+			if prevStats != nil {
+				// Get container stats reference without holding the main mutex
+				var (
+					containerStats *ContainerStats
+					exists         bool
+				)
+
+				sc.mutex.RLock()
+				containerStats, exists = sc.containers[containerID]
+				sc.mutex.RUnlock()
+
+				if exists && containerStats != nil {
+					containerStats.mutex.Lock()
+					containerStats.Stats = append(containerStats.Stats, StatsSample{
+						Timestamp: time.Now(),
+						CPUUsage:  cpuPercent,
+						MemoryMB:  memoryMB,
+					})
+					containerStats.mutex.Unlock()
+				}
+			}
+
+			// Save current stats for next iteration
+			prevStats = &stats
+		}
+	}
+}
+
+// calculateCPUPercent calculates CPU usage percentage from Docker stats.
+func calculateCPUPercent(prevStats, stats *container.Stats) float64 { //nolint:staticcheck // SA1019: use StatsResponse
+	// CPU calculation based on Docker's implementation
+	cpuDelta := float64(stats.CPUStats.CPUUsage.TotalUsage) - float64(prevStats.CPUStats.CPUUsage.TotalUsage)
+	systemDelta := float64(stats.CPUStats.SystemUsage) - float64(prevStats.CPUStats.SystemUsage)
+
+	if systemDelta > 0 && cpuDelta >= 0 {
+		// Calculate CPU percentage: (container CPU delta / system CPU delta) * number of CPUs * 100
+		numCPUs := float64(len(stats.CPUStats.CPUUsage.PercpuUsage))
+		if numCPUs == 0 {
+			// Fallback: if PercpuUsage is not available, assume 1 CPU
+			numCPUs = 1.0
+		}
+
+		return (cpuDelta / systemDelta) * numCPUs * 100.0
+	}
+
+	return 0.0
+}
+
+// ContainerStatsSummary represents summary statistics for a container.
+type ContainerStatsSummary struct {
+	ContainerName string
+	SampleCount   int
+	CPU           StatsSummary
+	Memory        StatsSummary
+}
+
+// MemoryViolation represents a container that exceeded the memory limit.
+type MemoryViolation struct {
+	ContainerName string
+	MaxMemoryMB   float64
+	LimitMB       float64
+}
+
+// StatsSummary represents min, max, and average for a metric.
+type StatsSummary struct {
+	Min     float64
+	Max     float64
+	Average float64
+}
+
+// GetSummary returns a summary of collected statistics.
+func (sc *StatsCollector) GetSummary() []ContainerStatsSummary {
+	// Take snapshot of container references without holding main lock long
+	sc.mutex.RLock()
+
+	containerRefs := make([]*ContainerStats, 0, len(sc.containers))
+	for _, containerStats := range sc.containers {
+		containerRefs = append(containerRefs, containerStats)
+	}
+
+	sc.mutex.RUnlock()
+
+	summaries := make([]ContainerStatsSummary, 0, len(containerRefs))
+
+	for _, containerStats := range containerRefs {
+		containerStats.mutex.RLock()
+		stats := make([]StatsSample, len(containerStats.Stats))
+		copy(stats, containerStats.Stats)
+		containerName := containerStats.ContainerName
+		containerStats.mutex.RUnlock()
+
+		if len(stats) == 0 {
+			continue
+		}
+
+		summary := ContainerStatsSummary{
+			ContainerName: containerName,
+			SampleCount:   len(stats),
+		}
+
+		// Calculate CPU stats
+		cpuValues := make([]float64, len(stats))
+		memoryValues := make([]float64, len(stats))
+
+		for i, sample := range stats {
+			cpuValues[i] = sample.CPUUsage
+			memoryValues[i] = sample.MemoryMB
+		}
+
+		summary.CPU = calculateStatsSummary(cpuValues)
+		summary.Memory = calculateStatsSummary(memoryValues)
+
+		summaries = append(summaries, summary)
+	}
+
+	// Sort by container name for consistent output
+	sort.Slice(summaries, func(i, j int) bool {
+		return summaries[i].ContainerName < summaries[j].ContainerName
+	})
+
+	return summaries
+}
+
+// calculateStatsSummary calculates min, max, and average for a slice of values.
+func calculateStatsSummary(values []float64) StatsSummary {
+	if len(values) == 0 {
+		return StatsSummary{}
+	}
+
+	minVal := values[0]
+	maxVal := values[0]
+	sum := 0.0
+
+	for _, value := range values {
+		if value < minVal {
+			minVal = value
+		}
+
+		if value > maxVal {
+			maxVal = value
+		}
+
+		sum += value
+	}
+
+	return StatsSummary{
+		Min:     minVal,
+		Max:     maxVal,
+		Average: sum / float64(len(values)),
+	}
+}
+
+// PrintSummary prints the statistics summary to the console.
+func (sc *StatsCollector) PrintSummary() {
+	summaries := sc.GetSummary()
+
+	if len(summaries) == 0 {
+		log.Printf("No container statistics collected")
+		return
+	}
+
+	log.Printf("Container Resource Usage Summary:")
+	log.Printf("================================")
+
+	for _, summary := range summaries {
+		log.Printf("Container: %s (%d samples)", summary.ContainerName, summary.SampleCount)
+		log.Printf("  CPU Usage:    Min: %6.2f%%  Max: %6.2f%%  Avg: %6.2f%%",
+			summary.CPU.Min, summary.CPU.Max, summary.CPU.Average)
+		log.Printf("  Memory Usage: Min: %6.1f MB Max: %6.1f MB Avg: %6.1f MB",
+			summary.Memory.Min, summary.Memory.Max, summary.Memory.Average)
+		log.Printf("")
+	}
+}
+
+// CheckMemoryLimits checks if any containers exceeded their memory limits.
+func (sc *StatsCollector) CheckMemoryLimits(hsLimitMB, tsLimitMB float64) []MemoryViolation {
+	if hsLimitMB <= 0 && tsLimitMB <= 0 {
+		return nil
+	}
+
+	summaries := sc.GetSummary()
+
+	var violations []MemoryViolation
+
+	for _, summary := range summaries {
+		var limitMB float64
+		if strings.HasPrefix(summary.ContainerName, "hs-") {
+			limitMB = hsLimitMB
+		} else if strings.HasPrefix(summary.ContainerName, "ts-") {
+			limitMB = tsLimitMB
+		} else {
+			continue // Skip containers that don't match our patterns
+		}
+
+		if limitMB > 0 && summary.Memory.Max > limitMB {
+			violations = append(violations, MemoryViolation{
+				ContainerName: summary.ContainerName,
+				MaxMemoryMB:   summary.Memory.Max,
+				LimitMB:       limitMB,
+			})
+		}
+	}
+
+	return violations
+}
+
+// PrintSummaryAndCheckLimits prints the statistics summary and returns memory violations if any.
+func (sc *StatsCollector) PrintSummaryAndCheckLimits(hsLimitMB, tsLimitMB float64) []MemoryViolation {
+	sc.PrintSummary()
+	return sc.CheckMemoryLimits(hsLimitMB, tsLimitMB)
+}
+
+// Close closes the stats collector and cleans up resources.
+func (sc *StatsCollector) Close() error {
+	sc.StopCollection()
+	return sc.client.Close()
+}
--- a/cmd/hi/tar_utils.go
+++ b/cmd/hi/tar_utils.go
@@ -1,100 +0,0 @@
-package main
-
-import (
-	"archive/tar"
-	"errors"
-	"fmt"
-	"io"
-	"os"
-	"path/filepath"
-	"strings"
-)
-
-// ErrFileNotFoundInTar indicates a file was not found in the tar archive.
-var ErrFileNotFoundInTar = errors.New("file not found in tar")
-
-// extractFileFromTar extracts a single file from a tar reader.
-func extractFileFromTar(tarReader io.Reader, fileName, outputPath string) error {
-	tr := tar.NewReader(tarReader)
-
-	for {
-		header, err := tr.Next()
-		if err == io.EOF {
-			break
-		}
-		if err != nil {
-			return fmt.Errorf("failed to read tar header: %w", err)
-		}
-
-		// Check if this is the file we're looking for
-		if filepath.Base(header.Name) == fileName {
-			if header.Typeflag == tar.TypeReg {
-				// Create the output file
-				outFile, err := os.Create(outputPath)
-				if err != nil {
-					return fmt.Errorf("failed to create output file: %w", err)
-				}
-				defer outFile.Close()
-
-				// Copy file contents
-				if _, err := io.Copy(outFile, tr); err != nil {
-					return fmt.Errorf("failed to copy file contents: %w", err)
-				}
-
-				return nil
-			}
-		}
-	}
-
-	return fmt.Errorf("%w: %s", ErrFileNotFoundInTar, fileName)
-}
-
-// extractDirectoryFromTar extracts all files from a tar reader to a target directory.
-func extractDirectoryFromTar(tarReader io.Reader, targetDir string) error {
-	tr := tar.NewReader(tarReader)
-
-	for {
-		header, err := tr.Next()
-		if err == io.EOF {
-			break
-		}
-		if err != nil {
-			return fmt.Errorf("failed to read tar header: %w", err)
-		}
-
-		// Clean the path to prevent directory traversal
-		cleanName := filepath.Clean(header.Name)
-		if strings.Contains(cleanName, "..") {
-			continue // Skip potentially dangerous paths
-		}
-
-		targetPath := filepath.Join(targetDir, filepath.Base(cleanName))
-
-		switch header.Typeflag {
-		case tar.TypeDir:
-			// Create directory
-			if err := os.MkdirAll(targetPath, os.FileMode(header.Mode)); err != nil {
-				return fmt.Errorf("failed to create directory %s: %w", targetPath, err)
-			}
-		case tar.TypeReg:
-			// Create file
-			outFile, err := os.Create(targetPath)
-			if err != nil {
-				return fmt.Errorf("failed to create file %s: %w", targetPath, err)
-			}
-
-			if _, err := io.Copy(outFile, tr); err != nil {
-				outFile.Close()
-				return fmt.Errorf("failed to copy file contents: %w", err)
-			}
-			outFile.Close()
-
-			// Set file permissions
-			if err := os.Chmod(targetPath, os.FileMode(header.Mode)); err != nil {
-				return fmt.Errorf("failed to set file permissions: %w", err)
-			}
-		}
-	}
-
-	return nil
-}
--- a/cmd/mapresponses/main.go
+++ b/cmd/mapresponses/main.go
@@ -0,0 +1,66 @@
+package main
+
+import (
+	"encoding/json"
+	"errors"
+	"fmt"
+	"os"
+
+	"github.com/creachadair/command"
+	"github.com/creachadair/flax"
+	"github.com/juanfont/headscale/hscontrol/mapper"
+	"github.com/juanfont/headscale/integration/integrationutil"
+)
+
+type MapConfig struct {
+	Directory string `flag:"directory,Directory to read map responses from"`
+}
+
+var (
+	mapConfig            MapConfig
+	errDirectoryRequired = errors.New("directory is required")
+)
+
+func main() {
+	root := command.C{
+		Name: "mapresponses",
+		Help: "MapResponses is a tool to map and compare map responses from a directory",
+		Commands: []*command.C{
+			{
+				Name:     "online",
+				Help:     "",
+				Usage:    "run [test-pattern] [flags]",
+				SetFlags: command.Flags(flax.MustBind, &mapConfig),
+				Run:      runOnline,
+			},
+			command.HelpCommand(nil),
+		},
+	}
+
+	env := root.NewEnv(nil).MergeFlags(true)
+	command.RunOrFail(env, os.Args[1:])
+}
+
+// runIntegrationTest executes the integration test workflow.
+func runOnline(env *command.Env) error {
+	if mapConfig.Directory == "" {
+		return errDirectoryRequired
+	}
+
+	resps, err := mapper.ReadMapResponsesFromDirectory(mapConfig.Directory)
+	if err != nil {
+		return fmt.Errorf("reading map responses from directory: %w", err)
+	}
+
+	expected := integrationutil.BuildExpectedOnlineMap(resps)
+
+	out, err := json.MarshalIndent(expected, "", "  ")
+	if err != nil {
+		return fmt.Errorf("marshaling expected online map: %w", err)
+	}
+
+	os.Stderr.Write(out)
+	os.Stderr.Write([]byte("\n"))
+
+	return nil
+}
--- a/config-example.yaml
+++ b/config-example.yaml
@@ -20,6 +20,7 @@ listen_addr: 127.0.0.1:8080

 # Address to listen to /metrics and /debug, you may want
 # to keep this endpoint private to your internal network
+# Use an emty value to disable the metrics listener.
 metrics_listen_addr: 127.0.0.1:9090

 # Address to listen for gRPC.
@@ -49,18 +50,29 @@ noise:
 # List of IP prefixes to allocate tailaddresses from.
 # Each prefix consists of either an IPv4 or IPv6 address,
 # and the associated prefix length, delimited by a slash.
-# It must be within IP ranges supported by the Tailscale
-# client - i.e., subnets of 100.64.0.0/10 and fd7a:115c:a1e0::/48.
-# See below:
-# IPv6: https://github.com/tailscale/tailscale/blob/22ebb25e833264f58d7c3f534a8b166894a89536/net/tsaddr/tsaddr.go#LL81C52-L81C71
+#
+# WARNING: These prefixes MUST be subsets of the standard Tailscale ranges:
+#   - IPv4: 100.64.0.0/10 (CGNAT range)
+#   - IPv6: fd7a:115c:a1e0::/48 (Tailscale ULA range)
+#
+# Using a SUBSET of these ranges is supported and useful if you want to
+# limit IP allocation to a smaller block (e.g., 100.64.0.0/24).
+#
+# Using ranges OUTSIDE of CGNAT/ULA is NOT supported and will cause
+# undefined behaviour. The Tailscale client has hard-coded assumptions
+# about these ranges and will break in subtle, hard-to-debug ways.
+#
+# See:
 # IPv4: https://github.com/tailscale/tailscale/blob/22ebb25e833264f58d7c3f534a8b166894a89536/net/tsaddr/tsaddr.go#L33
-# Any other range is NOT supported, and it will cause unexpected issues.
+# IPv6: https://github.com/tailscale/tailscale/blob/22ebb25e833264f58d7c3f534a8b166894a89536/net/tsaddr/tsaddr.go#LL81C52-L81C71
 prefixes:
  v4: 100.64.0.0/10
  v6: fd7a:115c:a1e0::/48

  # Strategy used for allocation of IPs to nodes, available options:
-  # - sequential (default): assigns the next free IP from the previous given IP.
+  # - sequential (default): assigns the next free IP from the previous given
+  #   IP. A best-effort approach is used and Headscale might leave holes in the
+  #   IP range or fill up existing holes in the IP range.
  # - random: assigns the next free IP from a pseudo-random IP generator (crypto/rand).
  allocation: sequential

@@ -105,7 +117,7 @@ derp:

    # For better connection stability (especially when using an Exit-Node and DNS is not working),
    # it is possible to optionally add the public IPv4 and IPv6 address to the Derp-Map using:
-    ipv4: 1.2.3.4
+    ipv4: 198.51.100.1
    ipv6: 2001:db8::1

  # List of externally available DERP maps encoded in JSON
@@ -128,7 +140,7 @@ derp:
  auto_update_enabled: true

  # How often should we check for DERP updates?
-  update_frequency: 24h
+  update_frequency: 3h

 # Disables the automatic check for headscale updates on startup
 disable_check_updates: false
@@ -225,9 +237,11 @@ tls_cert_path: ""
 tls_key_path: ""

 log:
+  # Valid log levels: panic, fatal, error, warn, info, debug, trace
+  level: info
+
  # Output formatting for logs: text or json
  format: text
-  level: info

 ## Policy
 # headscale supports Tailscale's ACL policies.
@@ -273,9 +287,9 @@ dns:
  # `hostname.base_domain` (e.g., _myhost.example.com_).
  base_domain: example.com

-  # Whether to use the local DNS settings of a node (default) or override the
-  # local DNS settings and force the use of Headscale's DNS configuration.
-  override_local_dns: false
+  # Whether to use the local DNS settings of a node or override the local DNS
+  # settings (default) and force the use of Headscale's DNS configuration.
+  override_local_dns: true

  # List of DNS servers to expose to clients.
  nameservers:
@@ -291,8 +305,7 @@ dns:

    # Split DNS (see https://tailscale.com/kb/1054/dns/),
    # a map of domains and which DNS server to use for each.
-    split:
-      {}
+    split: {}
      # foo.bar.com:
      #   - 1.1.1.1
      # darp.headscale.net:
@@ -358,6 +371,12 @@ unix_socket_permission: "0770"
 #   # required "openid" scope.
 #   scope: ["openid", "profile", "email"]
 #
+#   # Only verified email addresses are synchronized to the user profile by
+#   # default. Unverified emails may be allowed in case an identity provider
+#   # does not send the "email_verified: true" claim or email verification is
+#   # not required.
+#   email_verified_required: true
+#
 #   # Provide custom key/value pairs which get sent to the identity provider's
 #   # authorization endpoint.
 #   extra_params:
@@ -390,11 +409,13 @@ unix_socket_permission: "0770"
 #     method: S256

 # Logtail configuration
-# Logtail is Tailscales logging and auditing infrastructure, it allows the control panel
-# to instruct tailscale nodes to log their activity to a remote server.
+# Logtail is Tailscales logging and auditing infrastructure, it allows the
+# control panel to instruct tailscale nodes to log their activity to a remote
+# server. To disable logging on the client side, please refer to:
+# https://tailscale.com/kb/1011/log-mesh-traffic#opting-out-of-client-logging
 logtail:
-  # Enable logtail for this headscales clients.
-  # As there is currently no support for overriding the log server in headscale, this is
+  # Enable logtail for tailscale nodes of this Headscale instance.
+  # As there is currently no support for overriding the log server in Headscale, this is
  # disabled by default. Enabling this will make your clients send logs to Tailscale Inc.
  enabled: false

@@ -402,3 +423,24 @@ logtail:
 # default static port 41641. This option is intended as a workaround for some buggy
 # firewall devices. See https://tailscale.com/kb/1181/firewalls/ for more information.
 randomize_client_port: false
+
+# Taildrop configuration
+# Taildrop is the file sharing feature of Tailscale, allowing nodes to send files to each other.
+# https://tailscale.com/kb/1106/taildrop/
+taildrop:
+  # Enable or disable Taildrop for all nodes.
+  # When enabled, nodes can send files to other nodes owned by the same user.
+  # Tagged devices and cross-user transfers are not permitted by Tailscale clients.
+  enabled: true
+
+# Advanced performance tuning parameters.
+# The defaults are carefully chosen and should rarely need adjustment.
+# Only modify these if you have identified a specific performance issue.
+#
+# tuning:
+#   # NodeStore write batching configuration.
+#   # The NodeStore batches write operations before rebuilding peer relationships,
+#   # which is computationally expensive. Batching reduces rebuild frequency.
+#   #
+#   # node_store_batch_size: 100
+#   # node_store_batch_timeout: 500ms
--- a/derp-example.yaml
+++ b/derp-example.yaml
@@ -1,5 +1,6 @@
 # If you plan to somehow use headscale, please deploy your own DERP infra: https://tailscale.com/kb/1118/custom-derp-servers/
 regions:
+  1: null # Disable DERP region with ID 1
  900:
    regionid: 900
    regioncode: custom
@@ -7,9 +8,9 @@ regions:
    nodes:
      - name: 900a
        regionid: 900
-        hostname: myderp.mydomain.no
-        ipv4: 123.123.123.123
-        ipv6: "2604:a880:400:d1::828:b001"
+        hostname: myderp.example.com
+        ipv4: 198.51.100.1
+        ipv6: 2001:db8::1
        stunport: 0
        stunonly: false
        derpport: 0
--- a/docs/about/contributing.md
+++ b/docs/about/contributing.md
@@ -1,3 +1,3 @@
 {%
-    include-markdown "../../CONTRIBUTING.md"
+include-markdown "../../CONTRIBUTING.md"
 %}
--- a/docs/about/faq.md
+++ b/docs/about/faq.md
@@ -24,9 +24,12 @@ We are more than happy to exchange emails, or to have dedicated calls before a P

 ## When/Why is Feature X going to be implemented?

-We don't know. We might be working on it. If you're interested in contributing, please post a feature request about it.
+We use [GitHub Milestones to plan for upcoming Headscale releases](https://github.com/juanfont/headscale/milestones).
+Have a look at [our current plan](https://github.com/juanfont/headscale/milestones) to get an idea when a specific
+feature is about to be implemented. The release plan is subject to change at any time.

-Please be aware that there are a number of reasons why we might not accept specific contributions:
+If you're interested in contributing, please post a feature request about it. Please be aware that there are a number of
+reasons why we might not accept specific contributions:

 - It is not possible to implement the feature in a way that makes sense in a self-hosted environment.
 - Given that we are reverse-engineering Tailscale to satisfy our own curiosity, we might be interested in implementing the feature ourselves.
@@ -44,6 +47,15 @@ For convenience, we also [build container images with headscale](../setup/instal
 we don't officially support deploying headscale using Docker**. On our [Discord server](https://discord.gg/c84AZQhmpx)
 we have a "docker-issues" channel where you can ask for Docker-specific help to the community.

+## What is the recommended update path? Can I skip multiple versions while updating?
+
+Please follow the steps outlined in the [upgrade guide](../setup/upgrade.md) to update your existing Headscale
+installation. Its required to update from one stable version to the next (e.g. 0.26.0 → 0.27.1 → 0.28.0) without
+skipping minor versions in between. You should always pick the latest available patch release.
+
+Be sure to check the [changelog](https://github.com/juanfont/headscale/blob/main/CHANGELOG.md) for version specific
+upgrade instructions and breaking changes.
+
 ## Scaling / How many clients does Headscale support?

 It depends. As often stated, Headscale is not enterprise software and our focus
@@ -51,22 +63,22 @@ is homelabbers and self-hosters. Of course, we do not prevent people from using
 it in a commercial/professional setting and often get questions about scaling.

 Please note that when Headscale is developed, performance is not part of the
-consideration as the main audience is considered to be users with a moddest
+consideration as the main audience is considered to be users with a modest
 amount of devices. We focus on correctness and feature parity with Tailscale
 SaaS over time.

-To understand if you might be able to use Headscale for your usecase, I will
+To understand if you might be able to use Headscale for your use case, I will
 describe two scenarios in an effort to explain what is the central bottleneck
 of Headscale:

 1. An environment with 1000 servers

-   - they rarely "move" (change their endpoints)
-   - new nodes are added rarely
+    - they rarely "move" (change their endpoints)
+    - new nodes are added rarely

-2. An environment with 80 laptops/phones (end user devices)
+1. An environment with 80 laptops/phones (end user devices)

-   - nodes move often, e.g. switching from home to office
+    - nodes move often, e.g. switching from home to office

 Headscale calculates a map of all nodes that need to talk to each other,
 creating this "world map" requires a lot of CPU time. When an event that
@@ -76,7 +88,7 @@ new "world map" is created for every node in the network.
 This means that under certain conditions, Headscale can likely handle 100s
 of devices (maybe more), if there is _little to no change_ happening in the
 network. For example, in Scenario 1, the process of computing the world map is
-extremly demanding due to the size of the network, but when the map has been
+extremely demanding due to the size of the network, but when the map has been
 created and the nodes are not changing, the Headscale instance will likely
 return to a very low resource usage until the next time there is an event
 requiring the new map.
@@ -94,14 +106,14 @@ learn about the current state of the world.
 We expect that the performance will improve over time as we improve the code
 base, but it is not a focus. In general, we will never make the tradeoff to make
 things faster on the cost of less maintainable or readable code. We are a small
-team and have to optimise for maintainabillity.
+team and have to optimise for maintainability.

 ## Which database should I use?

 We recommend the use of SQLite as database for headscale:

 - SQLite is simple to setup and easy to use
- It scales well for all of headscale's usecases
+- It scales well for all of headscale's use cases
 - Development and testing happens primarily on SQLite
 - PostgreSQL is still supported, but is considered to be in "maintenance mode"

@@ -130,7 +142,72 @@ connect back to the administrator's node. Why do all nodes see the administrator
 `tailscale status`?

 This is essentially how Tailscale works. If traffic is allowed to flow in one direction, then both nodes see each other
-in their output of `tailscale status`. Traffic is still filtered according to the ACL, with the exception of `tailscale
-ping` which is always allowed in either direction.
+in their output of `tailscale status`. Traffic is still filtered according to the ACL, with the exception of
+`tailscale ping` which is always allowed in either direction.

 See also <https://tailscale.com/kb/1087/device-visibility>.
+
+## My policy is stored in the database and Headscale refuses to start due to an invalid policy. How can I recover?
+
+Headscale checks if the policy is valid during startup and refuses to start if it detects an error. The error message
+indicates which part of the policy is invalid. Follow these steps to fix your policy:
+
+- Dump the policy to a file: `headscale policy get --bypass-grpc-and-access-database-directly > policy.json`
+- Edit and fixup `policy.json`. Use the command `headscale policy check --file policy.json` to validate the policy.
+- Load the modified policy: `headscale policy set --bypass-grpc-and-access-database-directly --file policy.json`
+- Start Headscale as usual.
+
+!!! warning "Full server configuration required"
+
+    The above commands to get/set the policy require a complete server configuration file including database settings. A
+    minimal config to [control Headscale via remote CLI](../ref/api.md#grpc) is not sufficient. You may use
+    `headscale -c /path/to/config.yaml` to specify the path to an alternative configuration file.
+
+## How can I migrate back to the recommended IP prefixes?
+
+Tailscale only supports the IP prefixes `100.64.0.0/10` and `fd7a:115c:a1e0::/48` or smaller subnets thereof. The
+following steps can be used to migrate from unsupported IP prefixes back to the supported and recommended ones.
+
+!!! warning "Backup and test in a demo environment required"
+
+    The commands below update the IP addresses of all nodes in your tailnet and this might have a severe impact in your
+    specific environment. At a minimum:
+
+    - [Create a backup of your database](../setup/upgrade.md#backup)
+    - Test the commands below in a representive demo environment. This allows to catch subsequent connectivity errors
+      early and see how the tailnet behaves in your specific environment.
+
+- Stop Headscale
+- Restore the default prefixes in the [configuration file](../ref/configuration.md):
+    ```yaml
+    prefixes:
+      v4: 100.64.0.0/10
+      v6: fd7a:115c:a1e0::/48
+    ```
+- Update the `nodes.ipv4` and `nodes.ipv6` columns in the database and assign each node a unique IPv4 and IPv6 address.
+  The following SQL statement assigns IP addresses based on the node ID:
+    ```sql
+    UPDATE nodes
+    SET ipv4=concat('100.64.', id/256, '.', id%256),
+        ipv6=concat('fd7a:115c:a1e0::', format('%x', id));
+    ```
+- Update the [policy](../ref/acls.md) to reflect the IP address changes (if any)
+- Start Headscale
+
+Nodes should reconnect within a few seconds and pickup their newly assigned IP addresses.
+
+## How can I avoid to send logs to Tailscale Inc?
+
+A Tailscale client [collects logs about its operation and connection attempts with other
+clients](https://tailscale.com/kb/1011/log-mesh-traffic#client-logs) and sends them to a central log service operated by
+Tailscale Inc.
+
+Headscale, by default, instructs clients to disable log submission to the central log service. This configuration is
+applied by a client once it successfully connected with Headscale. See the configuration option `logtail.enabled` in the
+[configuration file](../ref/configuration.md) for details.
+
+Alternatively, logging can also be disabled on the client side. This is independent of Headscale and opting out of
+client logging disables log submission early during client startup. The configuration is operating system specific and
+is usually achieved by setting the environment variable `TS_NO_LOGS_NO_SUPPORT=true` or by passing the flag
+`--no-logs-no-support` to `tailscaled`. See
+<https://tailscale.com/kb/1011/log-mesh-traffic#opting-out-of-client-logging> for details.
--- a/docs/about/features.md
+++ b/docs/about/features.md
@@ -5,30 +5,31 @@ to provide self-hosters and hobbyists with an open-source server they can use fo
 provides on overview of Headscale's feature and compatibility with the Tailscale control server:

 - [x] Full "base" support of Tailscale's features
- [x] Node registration
-    - [x] Interactive
-    - [x] Pre authenticated key
+- [x] [Node registration](../ref/registration.md)
+    - [x] [Web authentication](../ref/registration.md#web-authentication)
+    - [x] [Pre authenticated key](../ref/registration.md#pre-authenticated-key)
 - [x] [DNS](../ref/dns.md)
    - [x] [MagicDNS](https://tailscale.com/kb/1081/magicdns)
    - [x] [Global and restricted nameservers (split DNS)](https://tailscale.com/kb/1054/dns#nameservers)
    - [x] [search domains](https://tailscale.com/kb/1054/dns#search-domains)
    - [x] [Extra DNS records (Headscale only)](../ref/dns.md#setting-extra-dns-records)
 - [x] [Taildrop (File Sharing)](https://tailscale.com/kb/1106/taildrop)
+- [x] [Tags](../ref/tags.md)
 - [x] [Routes](../ref/routes.md)
    - [x] [Subnet routers](../ref/routes.md#subnet-router)
    - [x] [Exit nodes](../ref/routes.md#exit-node)
 - [x] Dual stack (IPv4 and IPv6)
 - [x] Ephemeral nodes
- [x] Embedded [DERP server](https://tailscale.com/kb/1232/derp-servers)
+- [x] Embedded [DERP server](../ref/derp.md)
 - [x] Access control lists ([GitHub label "policy"](https://github.com/juanfont/headscale/labels/policy%20%F0%9F%93%9D))
    - [x] ACL management via API
    - [x] Some [Autogroups](https://tailscale.com/kb/1396/targets#autogroups), currently: `autogroup:internet`,
-      `autogroup:nonroot`, `autogroup:member`, `autogroup:tagged`
+      `autogroup:nonroot`, `autogroup:member`, `autogroup:tagged`, `autogroup:self`
    - [x] [Auto approvers](https://tailscale.com/kb/1337/acl-syntax#auto-approvers) for [subnet
      routers](../ref/routes.md#automatically-approve-routes-of-a-subnet-router) and [exit
      nodes](../ref/routes.md#automatically-approve-an-exit-node-with-auto-approvers)
    - [x] [Tailscale SSH](https://tailscale.com/kb/1193/tailscale-ssh)
-* [x] [Node registration using Single-Sign-On (OpenID Connect)](../ref/oidc.md) ([GitHub label "OIDC"](https://github.com/juanfont/headscale/labels/OIDC))
+- [x] [Node registration using Single-Sign-On (OpenID Connect)](../ref/oidc.md) ([GitHub label "OIDC"](https://github.com/juanfont/headscale/labels/OIDC))
    - [x] Basic registration
    - [x] Update user profile from identity provider
    - [ ] OIDC groups cannot be used in ACLs
--- a/docs/assets/favicon.png
+++ b/docs/assets/favicon.png
--- a/docs/assets/images/headscale-acl-network.png
+++ b/docs/assets/images/headscale-acl-network.png
--- a/docs/assets/logo/headscale3-dots.pdf
+++ b/docs/assets/logo/headscale3-dots.pdf
--- a/docs/assets/logo/headscale3-dots.png
+++ b/docs/assets/logo/headscale3-dots.png
--- a/docs/assets/logo/headscale3-dots.svg
+++ b/docs/assets/logo/headscale3-dots.svg
@@ -1 +1 @@
-<svg xmlns="http://www.w3.org/2000/svg" xml:space="preserve" style="fill-rule:evenodd;clip-rule:evenodd;stroke-linejoin:round;stroke-miterlimit:2" viewBox="0 0 1280 640"><circle cx="141.023" cy="338.36" r="117.472" style="fill:#f8b5cb" transform="matrix(.997276 0 0 1.00556 10.0024 -14.823)"/><circle cx="352.014" cy="268.302" r="33.095" style="fill:#a2a2a2" transform="matrix(1.01749 0 0 1 -3.15847 0)"/><circle cx="352.014" cy="268.302" r="33.095" style="fill:#a2a2a2" transform="matrix(1.01749 0 0 1 -3.15847 115.914)"/><circle cx="352.014" cy="268.302" r="33.095" style="fill:#a2a2a2" transform="matrix(1.01749 0 0 1 148.43 115.914)"/><circle cx="352.014" cy="268.302" r="33.095" style="fill:#a2a2a2" transform="matrix(1.01749 0 0 1 148.851 0)"/><circle cx="805.557" cy="336.915" r="118.199" style="fill:#8d8d8d" transform="matrix(.99196 0 0 1 3.36978 -10.2458)"/><circle cx="805.557" cy="336.915" r="118.199" style="fill:#8d8d8d" transform="matrix(.99196 0 0 1 255.633 -10.2458)"/><path d="M680.282 124.808h-68.093v390.325h68.081v-28.23H640V153.228h40.282v-28.42Z" style="fill:#303030"/><path d="M680.282 124.808h-68.093v390.325h68.081v-28.23H640V153.228h40.282v-28.42Z" style="fill:#303030" transform="matrix(-1 0 0 1 1857.19 0)"/></svg>
+<svg xmlns="http://www.w3.org/2000/svg" xml:space="preserve" style="fill-rule:evenodd;clip-rule:evenodd;stroke-linejoin:round;stroke-miterlimit:2" viewBox="0 0 1280 640"><circle cx="141.023" cy="338.36" r="117.472" style="fill:#f8b5cb" transform="matrix(.997276 0 0 1.00556 10.0024 -14.823)"/><circle cx="352.014" cy="268.302" r="33.095" style="fill:#a2a2a2" transform="matrix(1.01749 0 0 1 -3.15847 0)"/><circle cx="352.014" cy="268.302" r="33.095" style="fill:#a2a2a2" transform="matrix(1.01749 0 0 1 -3.15847 115.914)"/><circle cx="352.014" cy="268.302" r="33.095" style="fill:#a2a2a2" transform="matrix(1.01749 0 0 1 148.43 115.914)"/><circle cx="352.014" cy="268.302" r="33.095" style="fill:#a2a2a2" transform="matrix(1.01749 0 0 1 148.851 0)"/><circle cx="805.557" cy="336.915" r="118.199" style="fill:#8d8d8d" transform="matrix(.99196 0 0 1 3.36978 -10.2458)"/><circle cx="805.557" cy="336.915" r="118.199" style="fill:#8d8d8d" transform="matrix(.99196 0 0 1 255.633 -10.2458)"/><path d="M680.282 124.808h-68.093v390.325h68.081v-28.23H640V153.228h40.282v-28.42Z" style="fill:#303030"/><path d="M680.282 124.808h-68.093v390.325h68.081v-28.23H640V153.228h40.282v-28.42Z" style="fill:#303030" transform="matrix(-1 0 0 1 1857.19 0)"/></svg>
--- a/docs/assets/logo/headscale3_header_stacked_left.pdf
+++ b/docs/assets/logo/headscale3_header_stacked_left.pdf
--- a/docs/assets/logo/headscale3_header_stacked_left.png
+++ b/docs/assets/logo/headscale3_header_stacked_left.png
--- a/docs/assets/logo/headscale3_header_stacked_left.svg
+++ b/docs/assets/logo/headscale3_header_stacked_left.svg
--- a/docs/ref/acls.md
+++ b/docs/ref/acls.md
@@ -9,9 +9,38 @@ When using ACL's the User borders are no longer applied. All machines
 whichever the User have the ability to communicate with other hosts as
 long as the ACL's permits this exchange.

-## ACLs use case example
+## ACL Setup

-Let's build an example use case for a small business (It may be the place where
+To enable and configure ACLs in Headscale, you need to specify the path to your ACL policy file in the `policy.path` key in `config.yaml`.
+
+Your ACL policy file must be formatted using [huJSON](https://github.com/tailscale/hujson).
+
+Info on how these policies are written can be found
+[here](https://tailscale.com/kb/1018/acls/).
+
+Please reload or restart Headscale after updating the ACL file. Headscale may be reloaded either via its systemd service
+(`sudo systemctl reload headscale`) or by sending a SIGHUP signal (`sudo kill -HUP $(pidof headscale)`) to the main
+process. Headscale logs the result of ACL policy processing after each reload.
+
+## Simple Examples
+
+- [**Allow All**](https://tailscale.com/kb/1192/acl-samples#allow-all-default-acl): If you define an ACL file but completely omit the `"acls"` field from its content, Headscale will default to an "allow all" policy. This means all devices connected to your tailnet will be able to communicate freely with each other.
+
+    ```json
+    {}
+    ```
+
+- [**Deny All**](https://tailscale.com/kb/1192/acl-samples#deny-all): To prevent all communication within your tailnet, you can include an empty array for the `"acls"` field in your policy file.
+
+    ```json
+    {
+      "acls": []
+    }
+    ```
+
+## Complex Example
+
+Let's build a more complex example use case for a small business (It may be the place where
 ACL's are the most useful).

 We have a small company with a boss, an admin, two developers and an intern.
@@ -36,11 +65,7 @@ servers.
 - billing.internal
 - router.internal

-![ACL implementation example](../images/headscale-acl-network.png)
-
-## ACL setup
-
-ACLs have to be written in [huJSON](https://github.com/tailscale/hujson).
+![ACL implementation example](../assets/images/headscale-acl-network.png)

 When [registering the servers](../usage/getting-started.md#register-a-node) we
 will need to add the flag `--advertise-tags=tag:<tag1>,tag:<tag2>`, and the user
@@ -49,14 +74,6 @@ tags to a server they can register, the check of the tags is done on headscale
 server and only valid tags are applied. A tag is valid if the user that is
 registering it is allowed to do it.

-To use ACLs in headscale, you must edit your `config.yaml` file. In there you will find a `policy.path` parameter. This
-will need to point to your ACL file. More info on how these policies are written can be found
-[here](https://tailscale.com/kb/1018/acls/).
-
-Please reload or restart Headscale after updating the ACL file. Headscale may be reloaded either via its systemd service
-(`sudo systemctl reload headscale`) or by sending a SIGHUP signal (`sudo kill -HUP $(pidof headscale)`) to the main
-process. Headscale logs the result of ACL policy processing after each reload.
-
 Here are the ACL's to implement the same permissions as above:

 ```json title="acl.json"
@@ -177,13 +194,95 @@ Here are the ACL's to implement the same permissions as above:
      "dst": ["tag:dev-app-servers:80,443"]
    },

-    // We still have to allow internal users communications since nothing guarantees that each user have
-    // their own users.
-    { "action": "accept", "src": ["boss@"], "dst": ["boss@:*"] },
-    { "action": "accept", "src": ["dev1@"], "dst": ["dev1@:*"] },
-    { "action": "accept", "src": ["dev2@"], "dst": ["dev2@:*"] },
-    { "action": "accept", "src": ["admin1@"], "dst": ["admin1@:*"] },
-    { "action": "accept", "src": ["intern1@"], "dst": ["intern1@:*"] }
+    // Allow users to access their own devices using autogroup:self (see below for more details about performance impact)
+    {
+      "action": "accept",
+      "src": ["autogroup:member"],
+      "dst": ["autogroup:self:*"]
+    }
  ]
 }
 ```
+
+## Autogroups
+
+Headscale supports several autogroups that automatically include users, destinations, or devices with specific properties. Autogroups provide a convenient way to write ACL rules without manually listing individual users or devices.
+
+### `autogroup:internet`
+
+Allows access to the internet through [exit nodes](routes.md#exit-node). Can only be used in ACL destinations.
+
+```json
+{
+  "action": "accept",
+  "src": ["group:users"],
+  "dst": ["autogroup:internet:*"]
+}
+```
+
+### `autogroup:member`
+
+Includes all [personal (untagged) devices](registration.md/#identity-model).
+
+```json
+{
+  "action": "accept",
+  "src": ["autogroup:member"],
+  "dst": ["tag:prod-app-servers:80,443"]
+}
+```
+
+### `autogroup:tagged`
+
+Includes all devices that [have at least one tag](registration.md/#identity-model).
+
+```json
+{
+  "action": "accept",
+  "src": ["autogroup:tagged"],
+  "dst": ["tag:monitoring:9090"]
+}
+```
+
+### `autogroup:self`
+
+!!! warning "The current implementation of `autogroup:self` is inefficient"
+
+Includes devices where the same user is authenticated on both the source and destination. Does not include tagged devices. Can only be used in ACL destinations.
+
+```json
+{
+  "action": "accept",
+  "src": ["autogroup:member"],
+  "dst": ["autogroup:self:*"]
+}
+```
+
+*Using `autogroup:self` may cause performance degradation on the Headscale coordinator server in large deployments, as filter rules must be compiled per-node rather than globally and the current implementation is not very efficient.*
+
+If you experience performance issues, consider using more specific ACL rules or limiting the use of `autogroup:self`.
+
+```json
+{
+  // The following rules allow internal users to communicate with their
+  // own nodes in case autogroup:self is causing performance issues.
+  { "action": "accept", "src": ["boss@"], "dst": ["boss@:*"] },
+  { "action": "accept", "src": ["dev1@"], "dst": ["dev1@:*"] },
+  { "action": "accept", "src": ["dev2@"], "dst": ["dev2@:*"] },
+  { "action": "accept", "src": ["admin1@"], "dst": ["admin1@:*"] },
+  { "action": "accept", "src": ["intern1@"], "dst": ["intern1@:*"] }
+}
+```
+
+### `autogroup:nonroot`
+
+Used in Tailscale SSH rules to allow access to any user except root. Can only be used in the `users` field of SSH rules.
+
+```json
+{
+  "action": "accept",
+  "src": ["autogroup:member"],
+  "dst": ["autogroup:self"],
+  "users": ["autogroup:nonroot"]
+}
+```
--- a/docs/ref/api.md
+++ b/docs/ref/api.md
@@ -0,0 +1,129 @@
+# API
+
+Headscale provides a [HTTP REST API](#rest-api) and a [gRPC interface](#grpc) which may be used to integrate a [web
+interface](integration/web-ui.md), [remote control Headscale](#setup-remote-control) or provide a base for custom
+integration and tooling.
+
+Both interfaces require a valid API key before use. To create an API key, log into your Headscale server and generate
+one with the default expiration of 90 days:
+
+```shell
+headscale apikeys create
+```
+
+Copy the output of the command and save it for later. Please note that you can not retrieve an API key again. If the API
+key is lost, expire the old one, and create a new one.
+
+To list the API keys currently associated with the server:
+
+```shell
+headscale apikeys list
+```
+
+and to expire an API key:
+
+```shell
+headscale apikeys expire --prefix <PREFIX>
+```
+
+## REST API
+
+- API endpoint: `/api/v1`, e.g. `https://headscale.example.com/api/v1`
+- Documentation: `/swagger`, e.g. `https://headscale.example.com/swagger`
+- Headscale Version: `/version`, e.g. `https://headscale.example.com/version`
+- Authenticate using HTTP Bearer authentication by sending the [API key](#api) with the HTTP `Authorization: Bearer <API_KEY>` header.
+
+Start by [creating an API key](#api) and test it with the examples below. Read the API documentation provided by your
+Headscale server at `/swagger` for details.
+
+=== "Get details for all users"
+
+    ```console
+    curl -H "Authorization: Bearer <API_KEY>" \
+        https://headscale.example.com/api/v1/user
+    ```
+
+=== "Get details for user 'bob'"
+
+    ```console
+    curl -H "Authorization: Bearer <API_KEY>" \
+        https://headscale.example.com/api/v1/user?name=bob
+    ```
+
+=== "Register a node"
+
+    ```console
+    curl -H "Authorization: Bearer <API_KEY>" \
+        --json '{"user": "<USER>", "authId": "AUTH_ID>"}' \
+        https://headscale.example.com/api/v1/auth/register
+    ```
+
+## gRPC
+
+The gRPC interface can be used to control a Headscale instance from a remote machine with the `headscale` binary.
+
+### Prerequisite
+
+- A workstation to run `headscale` (any supported platform, e.g. Linux).
+- A Headscale server with gRPC enabled.
+- Connections to the gRPC port (default: `50443`) are allowed.
+- Remote access requires an encrypted connection via TLS.
+- An [API key](#api) to authenticate with the Headscale server.
+
+### Setup remote control
+
+1. Download the [`headscale` binary from GitHub's release page](https://github.com/juanfont/headscale/releases). Make
+   sure to use the same version as on the server.
+
+1. Put the binary somewhere in your `PATH`, e.g. `/usr/local/bin/headscale`
+
+1. Make `headscale` executable: `chmod +x /usr/local/bin/headscale`
+
+1. [Create an API key](#api) on the Headscale server.
+
+1. Provide the connection parameters for the remote Headscale server either via a minimal YAML configuration file or
+   via environment variables:
+
+    === "Minimal YAML configuration file"
+
+        ```yaml title="config.yaml"
+        cli:
+            address: <HEADSCALE_ADDRESS>:<PORT>
+            api_key: <API_KEY>
+        ```
+
+    === "Environment variables"
+
+        ```shell
+        export HEADSCALE_CLI_ADDRESS="<HEADSCALE_ADDRESS>:<PORT>"
+        export HEADSCALE_CLI_API_KEY="<API_KEY>"
+        ```
+
+    This instructs the `headscale` binary to connect to a remote instance at `<HEADSCALE_ADDRESS>:<PORT>`, instead of
+    connecting to the local instance.
+
+1. Test the connection by listing all nodes:
+
+    ```shell
+    headscale nodes list
+    ```
+
+    You should now be able to see a list of your nodes from your workstation, and you can
+    now control the Headscale server from your workstation.
+
+### Behind a proxy
+
+It's possible to run the gRPC remote endpoint behind a reverse proxy, like Nginx, and have it run on the _same_ port as Headscale.
+
+While this is _not a supported_ feature, an example on how this can be set up on
+[NixOS is shown here](https://github.com/kradalby/dotfiles/blob/4489cdbb19cddfbfae82cd70448a38fde5a76711/machines/headscale.oracldn/headscale.nix#L61-L91).
+
+### Troubleshooting
+
+- Make sure you have the _same_ Headscale version on your server and workstation.
+- Ensure that connections to the gRPC port are allowed.
+- Verify that your TLS certificate is valid and trusted.
+- If you don't have access to a trusted certificate (e.g. from Let's Encrypt), either:
+    - Add your self-signed certificate to the trust store of your OS _or_
+    - Disable certificate verification by either setting `cli.insecure: true` in the configuration file or by setting
+      `HEADSCALE_CLI_INSECURE=1` via an environment variable. We do **not** recommend to disable certificate validation.
--- a/docs/ref/configuration.md
+++ b/docs/ref/configuration.md
@@ -17,8 +17,8 @@

    === "View on GitHub"

-        * Development version: <https://github.com/juanfont/headscale/blob/main/config-example.yaml>
-        * Version {{ headscale.version }}: <https://github.com/juanfont/headscale/blob/v{{ headscale.version }}/config-example.yaml>
+        - Development version: <https://github.com/juanfont/headscale/blob/main/config-example.yaml>
+        - Version {{ headscale.version }}: https://github.com/juanfont/headscale/blob/v{{ headscale.version }}/config-example.yaml

    === "Download with `wget`"

--- a/docs/ref/debug.md
+++ b/docs/ref/debug.md
@@ -0,0 +1,118 @@
+# Debugging and troubleshooting
+
+Headscale and Tailscale provide debug and introspection capabilities that can be helpful when things don't work as
+expected. This page explains some debugging techniques to help pinpoint problems.
+
+Please also have a look at [Tailscale's Troubleshooting guide](https://tailscale.com/kb/1023/troubleshooting). It offers
+a many tips and suggestions to troubleshoot common issues.
+
+## Tailscale
+
+The Tailscale client itself offers many commands to introspect its state as well as the state of the network:
+
+- [Check local network conditions](https://tailscale.com/kb/1080/cli#netcheck): `tailscale netcheck`
+- [Get the client status](https://tailscale.com/kb/1080/cli#status): `tailscale status --json`
+- [Get DNS status](https://tailscale.com/kb/1080/cli#dns): `tailscale dns status --all`
+- Client logs: `tailscale debug daemon-logs`
+- Client netmap: `tailscale debug netmap`
+- Test DERP connection: `tailscale debug derp headscale`
+- And many more, see: `tailscale debug --help`
+
+Many of the commands are helpful when trying to understand differences between Headscale and Tailscale SaaS.
+
+## Headscale
+
+### Application logging
+
+The log levels `debug` and `trace` can be useful to get more information from Headscale.
+
+```yaml hl_lines="3"
+log:
+  # Valid log levels: panic, fatal, error, warn, info, debug, trace
+  level: debug
+```
+
+### Database logging
+
+The database debug mode logs all database queries. Enable it to see how Headscale interacts with its database. This also
+requires the application log level to be set to either `debug` or `trace`.
+
+```yaml hl_lines="3 7"
+database:
+  # Enable debug mode. This setting requires the log.level to be set to "debug" or "trace".
+  debug: false
+
+log:
+  # Valid log levels: panic, fatal, error, warn, info, debug, trace
+  level: debug
+```
+
+### Metrics and debug endpoint
+
+Headscale provides a metrics and debug endpoint. It allows to introspect different aspects such as:
+
+- Information about the Go runtime, memory usage and statistics
+- Connected nodes and pending registrations
+- Active ACLs, filters and SSH policy
+- Current DERPMap
+- Prometheus metrics
+
+!!! warning "Keep the metrics and debug endpoint private"
+
+    The listen address and port can be configured with the `metrics_listen_addr` variable in the [configuration
+    file](./configuration.md). By default it listens on localhost, port 9090.
+
+    Keep the metrics and debug endpoint private to your internal network and don't expose it to the Internet.
+
+    The metrics and debug interface can be disabled completely by setting `metrics_listen_addr: null` in the
+    [configuration file](./configuration.md).
+
+Query metrics via <http://localhost:9090/metrics> and get an overview of available debug information via
+<http://localhost:9090/debug/>. Metrics may be queried from outside localhost but the debug interface is subject to
+additional protection despite listening on all interfaces.
+
+=== "Direct access"
+
+    Access the debug interface directly on the server where Headscale is installed.
+
+    ```console
+    curl http://localhost:9090/debug/
+    ```
+
+=== "SSH port forwarding"
+
+    Use SSH port forwarding to forward Headscale's metrics and debug port to your device.
+
+    ```console
+    ssh <HEADSCALE_SERVER> -L 9090:localhost:9090
+    ```
+
+    Access the debug interface on your device by opening <http://localhost:9090/debug/> in your web browser.
+
+=== "Via debug key"
+
+    The access control of the debug interface supports the use of a debug key. Traffic is accepted if the path to a
+    debug key is set via the environment variable `TS_DEBUG_KEY_PATH` and the debug key sent as value for `debugkey`
+    parameter with each request.
+
+    ```console
+    openssl rand -hex 32 | tee debugkey.txt
+    export TS_DEBUG_KEY_PATH=debugkey.txt
+    headscale serve
+    ```
+
+    Access the debug interface on your device by opening `http://<IP_OF_HEADSCALE>:9090/debug/?debugkey=<DEBUG_KEY>` in
+    your web browser. The `debugkey` parameter must be sent with every request.
+
+=== "Via debug IP address"
+
+    The debug endpoint expects traffic from localhost. A different debug IP address may be configured by setting the
+    `TS_ALLOW_DEBUG_IP` environment variable before starting Headscale. The debug IP address is ignored when the HTTP
+    header `X-Forwarded-For` is present.
+
+    ```console
+    export TS_ALLOW_DEBUG_IP=192.168.0.10       # IP address of your device
+    headscale serve
+    ```
+
+    Access the debug interface on your device by opening `http://<IP_OF_HEADSCALE>:9090/debug/` in your web browser.
--- a/docs/ref/derp.md
+++ b/docs/ref/derp.md
@@ -0,0 +1,174 @@
+# DERP
+
+A [DERP (Designated Encrypted Relay for Packets) server](https://tailscale.com/kb/1232/derp-servers) is mainly used to
+relay traffic between two nodes in case a direct connection can't be established. Headscale provides an embedded DERP
+server to ensure seamless connectivity between nodes.
+
+## Configuration
+
+DERP related settings are configured within the `derp` section of the [configuration file](./configuration.md). The
+following sections only use a few of the available settings, check the [example configuration](./configuration.md) for
+all available configuration options.
+
+### Enable embedded DERP
+
+Headscale ships with an embedded DERP server which allows to run your own self-hosted DERP server easily. The embedded
+DERP server is disabled by default and needs to be enabled. In addition, you should configure the public IPv4 and public
+IPv6 address of your Headscale server for improved connection stability:
+
+```yaml title="config.yaml" hl_lines="3-5"
+derp:
+  server:
+    enabled: true
+    ipv4: 198.51.100.1
+    ipv6: 2001:db8::1
+```
+
+Keep in mind that [additional ports are needed to run a DERP server](../setup/requirements.md#ports-in-use). Besides
+relaying traffic, it also uses STUN (udp/3478) to help clients discover their public IP addresses and perform NAT
+traversal. [Check DERP server connectivity](#check-derp-server-connectivity) to see if everything works.
+
+### Remove Tailscale's DERP servers
+
+Once enabled, Headscale's embedded DERP is added to the list of free-to-use [DERP
+servers](https://tailscale.com/kb/1232/derp-servers) offered by Tailscale Inc. To only use Headscale's embedded DERP
+server, disable the loading of the default DERP map:
+
+```yaml title="config.yaml" hl_lines="6"
+derp:
+  server:
+    enabled: true
+    ipv4: 198.51.100.1
+    ipv6: 2001:db8::1
+  urls: []
+```
+
+!!! warning "Single point of failure"
+
+    Removing Tailscale's DERP servers means that there is now just a single DERP server available for clients. This is a
+    single point of failure and could hamper connectivity.
+
+    [Check DERP server connectivity](#check-derp-server-connectivity) with your embedded DERP server before removing
+    Tailscale's DERP servers.
+
+### Customize DERP map
+
+The DERP map offered to clients can be customized with a [dedicated YAML-configuration
+file](https://github.com/juanfont/headscale/blob/main/derp-example.yaml). This allows to modify previously loaded DERP
+maps fetched via URL or to offer your own, custom DERP servers to nodes.
+
+=== "Remove specific DERP regions"
+
+    The free-to-use [DERP servers](https://tailscale.com/kb/1232/derp-servers) are organized into regions via a region
+    ID. You can explicitly disable a specific region by setting its region ID to `null`. The following sample
+    `derp.yaml` disables the New York DERP region (which has the region ID 1):
+
+    ```yaml title="derp.yaml"
+    regions:
+      1: null
+    ```
+
+    Use the following configuration to serve the default DERP map (excluding New York) to nodes:
+
+    ```yaml title="config.yaml" hl_lines="6 7"
+    derp:
+      server:
+        enabled: false
+      urls:
+        - https://controlplane.tailscale.com/derpmap/default
+      paths:
+        - /etc/headscale/derp.yaml
+    ```
+
+=== "Provide custom DERP servers"
+
+    The following sample `derp.yaml` references two custom regions (`custom-east` with ID 900 and `custom-west` with ID 901)
+    with one custom DERP server in each region. Each DERP server offers DERP relay via HTTPS on tcp/443, support for captive
+    portal checks via HTTP on tcp/80 and STUN on udp/3478. See the definitions of
+    [DERPMap](https://pkg.go.dev/tailscale.com/tailcfg#DERPMap),
+    [DERPRegion](https://pkg.go.dev/tailscale.com/tailcfg#DERPRegion) and
+    [DERPNode](https://pkg.go.dev/tailscale.com/tailcfg#DERPNode) for all available options.
+
+    ```yaml title="derp.yaml"
+    regions:
+      900:
+        regionid: 900
+        regioncode: custom-east
+        regionname: My region (east)
+        nodes:
+          - name: 900a
+            regionid: 900
+            hostname: derp900a.example.com
+            ipv4: 198.51.100.1
+            ipv6: 2001:db8::1
+            canport80: true
+      901:
+        regionid: 901
+        regioncode: custom-west
+        regionname: My Region (west)
+        nodes:
+          - name: 901a
+            regionid: 901
+            hostname: derp901a.example.com
+            ipv4: 198.51.100.2
+            ipv6: 2001:db8::2
+            canport80: true
+    ```
+
+    Use the following configuration to only serve the two DERP servers from the above `derp.yaml`:
+
+    ```yaml title="config.yaml" hl_lines="5 6"
+    derp:
+      server:
+        enabled: false
+      urls: []
+      paths:
+        - /etc/headscale/derp.yaml
+    ```
+
+Independent of the custom DERP map, you may choose to [enable the embedded DERP server and have it automatically added
+to the custom DERP map](#enable-embedded-derp).
+
+### Verify clients
+
+Access to DERP serves can be restricted to nodes that are members of your Tailnet. Relay access is denied for unknown
+clients.
+
+=== "Embedded DERP"
+
+    Client verification is enabled by default.
+
+    ```yaml title="config.yaml" hl_lines="3"
+    derp:
+      server:
+        verify_clients: true
+    ```
+
+=== "3rd-party DERP"
+
+    Tailscale's `derper` provides two parameters to configure client verification:
+
+    - Use the `-verify-client-url` parameter of the `derper` and point it towards the `/verify` endpoint of your
+      Headscale server (e.g `https://headscale.example.com/verify`). The DERP server will query your Headscale instance
+      as soon as a client connects with it to ask whether access should be allowed or denied. Access is allowed if
+      Headscale knows about the connecting client and denied otherwise.
+    - The parameter `-verify-client-url-fail-open` controls what should happen when the DERP server can't reach the
+      Headscale instance. By default, it will allow access if Headscale is unreachable.
+
+## Check DERP server connectivity
+
+Any Tailscale client may be used to introspect the DERP map and to check for connectivity issues with DERP servers.
+
+- Display DERP map: `tailscale debug derp-map`
+- Check connectivity with the embedded DERP[^1]:`tailscale debug derp headscale`
+
+Additional DERP related metrics and information is available via the [metrics and debug
+endpoint](./debug.md#metrics-and-debug-endpoint).
+
+## Limitations
+
+- The embedded DERP server can't be used for Tailscale's captive portal checks as it doesn't support the `/generate_204`
+  endpoint via HTTP on port tcp/80.
+- There are no speed or throughput optimisations, the main purpose is to assist in node connectivity.
+
+[^1]: This assumes that the default region code of the [configuration file](./configuration.md) is used.
--- a/docs/ref/dns.md
+++ b/docs/ref/dns.md
@@ -1,7 +1,7 @@
 # DNS

 Headscale supports [most DNS features](../about/features.md) from Tailscale. DNS related settings can be configured
-within `dns` section of the [configuration file](./configuration.md).
+within the `dns` section of the [configuration file](./configuration.md).

 ## Setting extra DNS records

@@ -23,9 +23,9 @@ hostname and port combination "http://hostname-in-magic-dns.myvpn.example.com:30

 !!! warning "Limitations"

-    Currently, [only A and AAAA records are processed by Tailscale](https://github.com/tailscale/tailscale/blob/v1.78.3/ipn/ipnlocal/local.go#L4461-L4479).
+    Currently, [only A and AAAA records are processed by Tailscale](https://github.com/tailscale/tailscale/blob/v1.86.5/ipn/ipnlocal/node_backend.go#L662).

-1.  Configure extra DNS records using one of the available configuration options:
+1. Configure extra DNS records using one of the available configuration options:

    === "Static entries, via `dns.extra_records`"

@@ -66,12 +66,12 @@ hostname and port combination "http://hostname-in-magic-dns.myvpn.example.com:30

        !!! tip "Good to know"

-            * The `dns.extra_records_path` option in the [configuration file](./configuration.md) needs to reference the
+            - The `dns.extra_records_path` option in the [configuration file](./configuration.md) needs to reference the
              JSON file containing extra DNS records.
-            * Be sure to "sort keys" and produce a stable output in case you generate the JSON file with a script.
+            - Be sure to "sort keys" and produce a stable output in case you generate the JSON file with a script.
              Headscale uses a checksum to detect changes to the file and a stable output avoids unnecessary processing.

-1.  Verify that DNS records are properly set using the DNS querying tool of your choice:
+1. Verify that DNS records are properly set using the DNS querying tool of your choice:

    === "Query with dig"

@@ -87,7 +87,7 @@ hostname and port combination "http://hostname-in-magic-dns.myvpn.example.com:30
        100.64.0.3
        ```

-1.  Optional: Setup the reverse proxy
+1. Optional: Setup the reverse proxy

    The motivating example here was to be able to access internal monitoring services on the same host without
    specifying a port, depicted as NGINX configuration snippet:
--- a/docs/ref/integration/reverse-proxy.md
+++ b/docs/ref/integration/reverse-proxy.md
@@ -13,7 +13,7 @@ Running headscale behind a reverse proxy is useful when running multiple applica

 The reverse proxy MUST be configured to support WebSockets to communicate with Tailscale clients.

-WebSockets support is also required when using the headscale embedded DERP server. In this case, you will also need to expose the UDP port used for STUN (by default, udp/3478). Please check our [config-example.yaml](https://github.com/juanfont/headscale/blob/main/config-example.yaml).
+WebSockets support is also required when using the Headscale [embedded DERP server](../derp.md). In this case, you will also need to expose the UDP port used for STUN (by default, udp/3478). Please check our [config-example.yaml](https://github.com/juanfont/headscale/blob/main/config-example.yaml).

 ### Cloudflare

--- a/Show More
+++ b/Show More