cli: ensure tagged-devices is included in profile list (#2991 )

nix: use testers.nixosTest instead of nixosTest
nixosTest was renamed to testers.nixosTest in nixpkgs.
2026-01-11 11:50:30 +01:00 · 2026-01-09 16:31:23 +01:00 · 2026-01-09 12:57:32 +01:00 · 2026-01-09 12:34:16 +01:00 · 2026-01-09 12:34:16 +01:00 · 2026-01-09 12:34:16 +01:00
29 changed files with 697 additions and 244 deletions
--- a/.claude/agents/headscale-integration-tester.md
+++ b/.claude/agents/headscale-integration-tester.md
@@ -71,7 +71,7 @@ go run ./cmd/hi run "TestName" --timeout=60s
 - **Slow tests** (5+ min): Node expiration, HA failover
 - **Long-running tests** (10+ min): `TestNodeOnlineStatus` runs for 12 minutes

-**CRITICAL**: Only ONE test can run at a time due to Docker port conflicts and resource constraints.
+**CONCURRENT EXECUTION**: Multiple tests CAN run simultaneously. Each test run gets a unique Run ID for isolation. See "Concurrent Execution and Run ID Isolation" section below.

 ## Test Artifacts and Log Analysis

@@ -98,6 +98,97 @@ When tests fail, examine artifacts in this specific order:
 4. **Client status dumps** (`*_status.json`): Network state and peer connectivity information
 5. **Database snapshots** (`.db` files): For data consistency and state persistence issues

+## Concurrent Execution and Run ID Isolation
+
+### Overview
+
+The integration test system supports running multiple tests concurrently on the same Docker daemon. Each test run is isolated through a unique Run ID that ensures containers, networks, and cleanup operations don't interfere with each other.
+
+### Run ID Format and Usage
+
+Each test run generates a unique Run ID in the format: `YYYYMMDD-HHMMSS-{6-char-hash}`
+- Example: `20260109-104215-mdjtzx`
+
+The Run ID is used for:
+- **Container naming**: `ts-{runIDShort}-{version}-{hash}` (e.g., `ts-mdjtzx-1-74-fgdyls`)
+- **Docker labels**: All containers get `hi.run-id={runID}` label
+- **Log directories**: `control_logs/{runID}/`
+- **Cleanup isolation**: Only containers with matching run ID are cleaned up
+
+### Container Isolation Mechanisms
+
+1. **Unique Container Names**: Each container includes the run ID for identification
+2. **Docker Labels**: `hi.run-id` and `hi.test-type` labels on all containers
+3. **Dynamic Port Allocation**: All ports use `{HostPort: "0"}` to let kernel assign free ports
+4. **Per-Run Networks**: Network names include scenario hash for isolation
+5. **Isolated Cleanup**: `killTestContainersByRunID()` only removes containers matching the run ID
+
+### ⚠️ CRITICAL: Never Interfere with Other Test Runs
+
+**FORBIDDEN OPERATIONS** when other tests may be running:
+
+```bash
+# ❌ NEVER do global container cleanup while tests are running
+docker rm -f $(docker ps -q --filter "name=hs-")
+docker rm -f $(docker ps -q --filter "name=ts-")
+
+# ❌ NEVER kill all test containers
+# This will destroy other agents' test sessions!
+
+# ❌ NEVER prune all Docker resources during active tests
+docker system prune -f  # Only safe when NO tests are running
+```
+
+**SAFE OPERATIONS**:
+
+```bash
+# ✅ Clean up only YOUR test run's containers (by run ID)
+# The test runner does this automatically via cleanup functions
+
+# ✅ Clean stale (stopped/exited) containers only
+# Pre-test cleanup only removes stopped containers, not running ones
+
+# ✅ Check what's running before cleanup
+docker ps --filter "name=headscale-test-suite" --format "{{.Names}}"
+```
+
+### Running Concurrent Tests
+
+```bash
+# Start multiple tests in parallel - each gets unique run ID
+go run ./cmd/hi run "TestPingAllByIP" &
+go run ./cmd/hi run "TestACLAllowUserDst" &
+go run ./cmd/hi run "TestOIDCAuthenticationPingAll" &
+
+# Monitor running test suites
+docker ps --filter "name=headscale-test-suite" --format "table {{.Names}}\t{{.Status}}"
+```
+
+### Agent Session Isolation Rules
+
+When working as an agent:
+
+1. **Your run ID is unique**: Each test you start gets its own run ID
+2. **Never clean up globally**: Only use run ID-specific cleanup
+3. **Check before cleanup**: Verify no other tests are running if you need to prune resources
+4. **Respect other sessions**: Other agents may have tests running concurrently
+5. **Log directories are isolated**: Your artifacts are in `control_logs/{your-run-id}/`
+
+### Identifying Your Containers
+
+Your test containers can be identified by:
+- The run ID in the container name
+- The `hi.run-id` Docker label
+- The test suite container: `headscale-test-suite-{your-run-id}`
+
+```bash
+# List containers for a specific run ID
+docker ps --filter "label=hi.run-id=20260109-104215-mdjtzx"
+
+# Get your run ID from the test output
+# Look for: "Run ID: 20260109-104215-mdjtzx"
+```
+
 ## Common Failure Patterns and Root Cause Analysis

 ### CRITICAL MINDSET: Code Issues vs Infrastructure Issues
@@ -250,10 +341,10 @@ require.NotNil(t, targetNode, "should find expected node")
   - **Detection**: No progress in logs for >2 minutes during initialization
   - **Solution**: `docker system prune -f` and retry

-3. **Docker Port Conflicts**: Multiple tests trying to use same ports
-   - **Pattern**: "bind: address already in use" errors
-   - **Detection**: Port binding failures in Docker logs
-   - **Solution**: Only run ONE test at a time
+3. **Docker Resource Exhaustion**: Too many concurrent tests overwhelming system
+   - **Pattern**: Container creation timeouts, OOM kills, slow test execution
+   - **Detection**: System load high, Docker daemon slow to respond
+   - **Solution**: Reduce number of concurrent tests, wait for completion before starting more

 **CODE ISSUES (99% of failures)**:
 1. **Route Approval Process Failures**: Routes not getting approved when they should be
@@ -273,12 +364,22 @@ require.NotNil(t, targetNode, "should find expected node")

 ### Critical Test Environment Setup

-**Pre-Test Cleanup (MANDATORY)**:
+**Pre-Test Cleanup**:
+
+The test runner automatically handles cleanup:
+- **Before test**: Removes only stale (stopped/exited) containers - does NOT affect running tests
+- **After test**: Removes only containers belonging to the specific run ID
+
 ```bash
-# ALWAYS run this before each test
+# Only clean old log directories if disk space is low
 rm -rf control_logs/202507*
-docker system prune -f
 df -h  # Verify sufficient disk space
+
+# SAFE: Clean only stale/stopped containers (does not affect running tests)
+# The test runner does this automatically via cleanupStaleTestContainers()
+
+# ⚠️ DANGEROUS: Only use when NO tests are running
+docker system prune -f
 ```

 **Environment Verification**:
@@ -286,8 +387,8 @@ df -h  # Verify sufficient disk space
 # Verify system readiness
 go run ./cmd/hi doctor

-# Check for running containers that might conflict
-docker ps
+# Check what tests are currently running (ALWAYS check before global cleanup)
+docker ps --filter "name=headscale-test-suite" --format "{{.Names}}"
 ```

 ### Specific Test Categories and Known Issues
@@ -756,8 +857,14 @@ assert.EventuallyWithT(t, func(c *assert.CollectT) {
   - **Why security focus**: Integration tests are the last line of defense against security regressions
   - **EventuallyWithT Usage**: Proper use prevents race conditions without weakening security assertions

+6. **Concurrent Execution Awareness**: Respect run ID isolation and never interfere with other agents' test sessions. Each test run has a unique run ID - only clean up YOUR containers (by run ID label), never perform global cleanup while tests may be running.
+   - **Why this matters**: Multiple agents/users may run tests concurrently on the same Docker daemon
+   - **Key Rule**: NEVER use global container cleanup commands - the test runner handles cleanup automatically per run ID
+
 **CRITICAL PRINCIPLE**: Test expectations are sacred contracts that define correct system behavior. When tests fail, fix the code to match the test, never change the test to match broken code. Only timing and observability improvements are allowed - business logic expectations are immutable.

+**ISOLATION PRINCIPLE**: Each test run is isolated by its unique Run ID. Never interfere with other test sessions. The system handles cleanup automatically - manual global cleanup commands are forbidden when other tests may be running.
+
 **EventuallyWithT PRINCIPLE**: Every external call to headscale server or tailscale client must be wrapped in EventuallyWithT. Follow the five key rules strictly: one external call per block, proper variable scoping, no nesting, use CollectT for assertions, and provide descriptive messages.

 **Remember**: Test failures are usually code issues in Headscale that need to be fixed, not infrastructure problems to be ignored. Use the specific debugging workflows and failure patterns documented above to efficiently identify root causes. Infrastructure issues have very specific signatures - everything else is code-related.
--- a/.github/workflows/test-integration.yaml
+++ b/.github/workflows/test-integration.yaml
@@ -165,6 +165,7 @@ jobs:
          - TestPreAuthKeyCommandWithoutExpiry
          - TestPreAuthKeyCommandReusableEphemeral
          - TestPreAuthKeyCorrectUserLoggedInCommand
+          - TestTaggedNodesCLIOutput
          - TestApiKeyCommand
          - TestNodeCommand
          - TestNodeExpireCommand
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -405,13 +405,29 @@ go run ./cmd/hi run "TestName" --postgres

 # Pattern matching for related tests
 go run ./cmd/hi run "TestPattern*"
+
+# Run multiple tests concurrently (each gets isolated run ID)
+go run ./cmd/hi run "TestPingAllByIP" &
+go run ./cmd/hi run "TestACLAllowUserDst" &
+go run ./cmd/hi run "TestOIDCAuthenticationPingAll" &
 ```

+**Concurrent Execution Support**:
+
+The test runner supports running multiple tests concurrently on the same Docker daemon:
+
+- Each test run gets a **unique Run ID** (format: `YYYYMMDD-HHMMSS-{6-char-hash}`)
+- All containers are labeled with `hi.run-id` for isolation
+- Container names include the run ID for easy identification (e.g., `ts-{runID}-1-74-{hash}`)
+- Dynamic port allocation prevents port conflicts between concurrent runs
+- Cleanup only affects containers belonging to the specific run ID
+- Log directories are isolated per run: `control_logs/{runID}/`
+
 **Critical Notes**:

- Only ONE test can run at a time (Docker port conflicts)
 - Tests generate ~100MB of logs per run in `control_logs/`
- Clean environment before each test: `sudo rm -rf control_logs/202* && docker system prune -f`
+- Running many tests concurrently may cause resource contention (CPU/memory)
+- Clean stale containers periodically: `docker system prune -f`

 ### Test Artifacts Location

--- a/cmd/hi/cleanup.go
+++ b/cmd/hi/cleanup.go
@@ -18,9 +18,11 @@ import (
 )

 // cleanupBeforeTest performs cleanup operations before running tests.
+// Only removes stale (stopped/exited) test containers to avoid interfering with concurrent test runs.
 func cleanupBeforeTest(ctx context.Context) error {
-	if err := killTestContainers(ctx); err != nil {
-		return fmt.Errorf("failed to kill test containers: %w", err)
+	err := cleanupStaleTestContainers(ctx)
+	if err != nil {
+		return fmt.Errorf("failed to clean stale test containers: %w", err)
 	}

 	if err := pruneDockerNetworks(ctx); err != nil {
@@ -30,11 +32,25 @@ func cleanupBeforeTest(ctx context.Context) error {
 	return nil
 }

-// cleanupAfterTest removes the test container after completion.
-func cleanupAfterTest(ctx context.Context, cli *client.Client, containerID string) error {
-	return cli.ContainerRemove(ctx, containerID, container.RemoveOptions{
+// cleanupAfterTest removes the test container and all associated integration test containers for the run.
+func cleanupAfterTest(ctx context.Context, cli *client.Client, containerID, runID string) error {
+	// Remove the main test container
+	err := cli.ContainerRemove(ctx, containerID, container.RemoveOptions{
 		Force: true,
 	})
+	if err != nil {
+		return fmt.Errorf("failed to remove test container: %w", err)
+	}
+
+	// Clean up integration test containers for this run only
+	if runID != "" {
+		err := killTestContainersByRunID(ctx, runID)
+		if err != nil {
+			return fmt.Errorf("failed to clean up containers for run %s: %w", runID, err)
+		}
+	}
+
+	return nil
 }

 // killTestContainers terminates and removes all test containers.
@@ -87,6 +103,100 @@ func killTestContainers(ctx context.Context) error {
 	return nil
 }

+// killTestContainersByRunID terminates and removes all test containers for a specific run ID.
+// This function filters containers by the hi.run-id label to only affect containers
+// belonging to the specified test run, leaving other concurrent test runs untouched.
+func killTestContainersByRunID(ctx context.Context, runID string) error {
+	cli, err := createDockerClient()
+	if err != nil {
+		return fmt.Errorf("failed to create Docker client: %w", err)
+	}
+	defer cli.Close()
+
+	// Filter containers by hi.run-id label
+	containers, err := cli.ContainerList(ctx, container.ListOptions{
+		All: true,
+		Filters: filters.NewArgs(
+			filters.Arg("label", "hi.run-id="+runID),
+		),
+	})
+	if err != nil {
+		return fmt.Errorf("failed to list containers for run %s: %w", runID, err)
+	}
+
+	removed := 0
+
+	for _, cont := range containers {
+		// Kill the container if it's running
+		if cont.State == "running" {
+			_ = cli.ContainerKill(ctx, cont.ID, "KILL")
+		}
+
+		// Remove the container with retry logic
+		if removeContainerWithRetry(ctx, cli, cont.ID) {
+			removed++
+		}
+	}
+
+	if removed > 0 {
+		fmt.Printf("Removed %d containers for run ID %s\n", removed, runID)
+	}
+
+	return nil
+}
+
+// cleanupStaleTestContainers removes stopped/exited test containers without affecting running tests.
+// This is useful for cleaning up leftover containers from previous crashed or interrupted test runs
+// without interfering with currently running concurrent tests.
+func cleanupStaleTestContainers(ctx context.Context) error {
+	cli, err := createDockerClient()
+	if err != nil {
+		return fmt.Errorf("failed to create Docker client: %w", err)
+	}
+	defer cli.Close()
+
+	// Only get stopped/exited containers
+	containers, err := cli.ContainerList(ctx, container.ListOptions{
+		All: true,
+		Filters: filters.NewArgs(
+			filters.Arg("status", "exited"),
+			filters.Arg("status", "dead"),
+		),
+	})
+	if err != nil {
+		return fmt.Errorf("failed to list stopped containers: %w", err)
+	}
+
+	removed := 0
+
+	for _, cont := range containers {
+		// Only remove containers that look like test containers
+		shouldRemove := false
+
+		for _, name := range cont.Names {
+			if strings.Contains(name, "headscale-test-suite") ||
+				strings.Contains(name, "hs-") ||
+				strings.Contains(name, "ts-") ||
+				strings.Contains(name, "derp-") {
+				shouldRemove = true
+				break
+			}
+		}
+
+		if shouldRemove {
+			if removeContainerWithRetry(ctx, cli, cont.ID) {
+				removed++
+			}
+		}
+	}
+
+	if removed > 0 {
+		fmt.Printf("Removed %d stale test containers\n", removed)
+	}
+
+	return nil
+}
+
 const (
 	containerRemoveInitialInterval = 100 * time.Millisecond
 	containerRemoveMaxElapsedTime  = 2 * time.Second
--- a/cmd/hi/docker.go
+++ b/cmd/hi/docker.go
@@ -26,93 +26,8 @@ var (
 	ErrTestFailed              = errors.New("test failed")
 	ErrUnexpectedContainerWait = errors.New("unexpected end of container wait")
 	ErrNoDockerContext         = errors.New("no docker context found")
-	ErrAnotherRunInProgress    = errors.New("another integration test run is already in progress")
 )

-// RunningTestInfo contains information about a currently running integration test.
-type RunningTestInfo struct {
-	RunID         string
-	ContainerID   string
-	ContainerName string
-	StartTime     time.Time
-	Duration      time.Duration
-	TestPattern   string
-}
-
-// ErrNoRunningTests indicates that no integration test is currently running.
-var ErrNoRunningTests = errors.New("no running tests found")
-
-// checkForRunningTests checks if there's already an integration test running.
-// Returns ErrNoRunningTests if no test is running, or RunningTestInfo with details about the running test.
-func checkForRunningTests(ctx context.Context) (*RunningTestInfo, error) {
-	cli, err := createDockerClient()
-	if err != nil {
-		return nil, fmt.Errorf("failed to create Docker client: %w", err)
-	}
-	defer cli.Close()
-
-	// List all running containers
-	containers, err := cli.ContainerList(ctx, container.ListOptions{
-		All: false, // Only running containers
-	})
-	if err != nil {
-		return nil, fmt.Errorf("failed to list containers: %w", err)
-	}
-
-	// Look for containers with hi.test-type=test-runner label
-	for _, cont := range containers {
-		if cont.Labels != nil && cont.Labels["hi.test-type"] == "test-runner" {
-			// Found a running test runner container
-			runID := cont.Labels["hi.run-id"]
-
-			containerName := ""
-			for _, name := range cont.Names {
-				containerName = strings.TrimPrefix(name, "/")
-
-				break
-			}
-
-			// Get more details via inspection
-			inspect, err := cli.ContainerInspect(ctx, cont.ID)
-			if err != nil {
-				// Return basic info if inspection fails
-				return &RunningTestInfo{
-					RunID:         runID,
-					ContainerID:   cont.ID,
-					ContainerName: containerName,
-				}, nil
-			}
-
-			startTime, _ := time.Parse(time.RFC3339Nano, inspect.State.StartedAt)
-			duration := time.Since(startTime)
-
-			// Try to extract test pattern from command
-			testPattern := ""
-
-			if len(inspect.Config.Cmd) > 0 {
-				for i, arg := range inspect.Config.Cmd {
-					if arg == "-run" && i+1 < len(inspect.Config.Cmd) {
-						testPattern = inspect.Config.Cmd[i+1]
-
-						break
-					}
-				}
-			}
-
-			return &RunningTestInfo{
-				RunID:         runID,
-				ContainerID:   cont.ID,
-				ContainerName: containerName,
-				StartTime:     startTime,
-				Duration:      duration,
-				TestPattern:   testPattern,
-			}, nil
-		}
-	}
-
-	return nil, ErrNoRunningTests
-}
-
 // runTestContainer executes integration tests in a Docker container.
 func runTestContainer(ctx context.Context, config *RunConfig) error {
 	cli, err := createDockerClient()
@@ -174,6 +89,9 @@ func runTestContainer(ctx context.Context, config *RunConfig) error {
 	}

 	log.Printf("Starting test: %s", config.TestPattern)
+	log.Printf("Run ID: %s", runID)
+	log.Printf("Monitor with: docker logs -f %s", containerName)
+	log.Printf("Logs directory: %s", logsDir)

 	// Start stats collection for container resource monitoring (if enabled)
 	var statsCollector *StatsCollector
@@ -234,9 +152,12 @@ func runTestContainer(ctx context.Context, config *RunConfig) error {
 	shouldCleanup := config.CleanAfter && (!config.KeepOnFailure || exitCode == 0)
 	if shouldCleanup {
 		if config.Verbose {
-			log.Printf("Running post-test cleanup...")
+			log.Printf("Running post-test cleanup for run %s...", runID)
 		}
-		if cleanErr := cleanupAfterTest(ctx, cli, resp.ID); cleanErr != nil && config.Verbose {
+
+		cleanErr := cleanupAfterTest(ctx, cli, resp.ID, runID)
+
+		if cleanErr != nil && config.Verbose {
 			log.Printf("Warning: post-test cleanup failed: %v", cleanErr)
 		}

--- a/cmd/hi/run.go
+++ b/cmd/hi/run.go
@@ -6,7 +6,6 @@ import (
 	"log"
 	"os"
 	"path/filepath"
-	"strings"
 	"time"

 	"github.com/creachadair/command"
@@ -14,65 +13,13 @@ import (

 var ErrTestPatternRequired = errors.New("test pattern is required as first argument or use --test flag")

-// formatRunningTestError creates a detailed error message about a running test.
-func formatRunningTestError(info *RunningTestInfo) error {
-	var msg strings.Builder
-	msg.WriteString("\n")
-	msg.WriteString("╔══════════════════════════════════════════════════════════════════╗\n")
-	msg.WriteString("║  Another integration test run is already in progress!            ║\n")
-	msg.WriteString("╚══════════════════════════════════════════════════════════════════╝\n")
-	msg.WriteString("\n")
-	msg.WriteString("Running test details:\n")
-	msg.WriteString(fmt.Sprintf("  Run ID:      %s\n", info.RunID))
-	msg.WriteString(fmt.Sprintf("  Container:   %s\n", info.ContainerName))
-
-	if info.TestPattern != "" {
-		msg.WriteString(fmt.Sprintf("  Test:        %s\n", info.TestPattern))
-	}
-
-	if !info.StartTime.IsZero() {
-		msg.WriteString(fmt.Sprintf("  Started:     %s\n", info.StartTime.Format("2006-01-02 15:04:05")))
-		msg.WriteString(fmt.Sprintf("  Running for: %s\n", formatDuration(info.Duration)))
-	}
-
-	msg.WriteString("\n")
-	msg.WriteString("Please wait for the current test to complete, or stop it with:\n")
-	msg.WriteString("  go run ./cmd/hi clean containers\n")
-	msg.WriteString("\n")
-	msg.WriteString("To monitor the running test:\n")
-	msg.WriteString(fmt.Sprintf("  docker logs -f %s\n", info.ContainerName))
-
-	return fmt.Errorf("%w\n%s", ErrAnotherRunInProgress, msg.String())
-}
-
-const secondsPerMinute = 60
-
-// formatDuration formats a duration in a human-readable way.
-func formatDuration(d time.Duration) string {
-	if d < time.Minute {
-		return fmt.Sprintf("%d seconds", int(d.Seconds()))
-	}
-
-	if d < time.Hour {
-		minutes := int(d.Minutes())
-		seconds := int(d.Seconds()) % secondsPerMinute
-
-		return fmt.Sprintf("%d minutes, %d seconds", minutes, seconds)
-	}
-
-	hours := int(d.Hours())
-	minutes := int(d.Minutes()) % secondsPerMinute
-
-	return fmt.Sprintf("%d hours, %d minutes", hours, minutes)
-}
-
 type RunConfig struct {
 	TestPattern   string        `flag:"test,Test pattern to run"`
 	Timeout       time.Duration `flag:"timeout,default=120m,Test timeout"`
 	FailFast      bool          `flag:"failfast,default=true,Stop on first test failure"`
 	UsePostgres   bool          `flag:"postgres,default=false,Use PostgreSQL instead of SQLite"`
 	GoVersion     string        `flag:"go-version,Go version to use (auto-detected from go.mod)"`
-	CleanBefore   bool          `flag:"clean-before,default=true,Clean resources before test"`
+	CleanBefore   bool          `flag:"clean-before,default=true,Clean stale resources before test"`
 	CleanAfter    bool          `flag:"clean-after,default=true,Clean resources after test"`
 	KeepOnFailure bool          `flag:"keep-on-failure,default=false,Keep containers on test failure"`
 	LogsDir       string        `flag:"logs-dir,default=control_logs,Control logs directory"`
@@ -80,7 +27,6 @@ type RunConfig struct {
 	Stats         bool          `flag:"stats,default=false,Collect and display container resource usage statistics"`
 	HSMemoryLimit float64       `flag:"hs-memory-limit,default=0,Fail test if any Headscale container exceeds this memory limit in MB (0 = disabled)"`
 	TSMemoryLimit float64       `flag:"ts-memory-limit,default=0,Fail test if any Tailscale container exceeds this memory limit in MB (0 = disabled)"`
-	Force         bool          `flag:"force,default=false,Kill any running test and start a new one"`
 }

 // runIntegrationTest executes the integration test workflow.
@@ -98,23 +44,6 @@ func runIntegrationTest(env *command.Env) error {
 		runConfig.GoVersion = detectGoVersion()
 	}

-	// Check if another test run is already in progress
-	runningTest, err := checkForRunningTests(env.Context())
-	if err != nil && !errors.Is(err, ErrNoRunningTests) {
-		log.Printf("Warning: failed to check for running tests: %v", err)
-	} else if runningTest != nil {
-		if runConfig.Force {
-			log.Printf("Force flag set, killing existing test run: %s", runningTest.RunID)
-
-			err = killTestContainers(env.Context())
-			if err != nil {
-				return fmt.Errorf("failed to kill existing test containers: %w", err)
-			}
-		} else {
-			return formatRunningTestError(runningTest)
-		}
-	}
-
 	// Run pre-flight checks
 	if runConfig.Verbose {
 		log.Printf("Running pre-flight system checks...")
--- a/docs/ref/integration/tools.md
+++ b/docs/ref/integration/tools.md
@@ -7,6 +7,7 @@

 This page collects third-party tools, client libraries, and scripts related to headscale.

+- [headscale-operator](https://github.com/infradohq/headscale-operator) - Headscale Kubernetes Operator
 - [tailscale-manager](https://github.com/singlestore-labs/tailscale-manager) - Dynamically manage Tailscale route
  advertisements
 - [headscalebacktosqlite](https://github.com/bigbozza/headscalebacktosqlite) - Migrate headscale from PostgreSQL back to
--- a/flake.lock
+++ b/flake.lock
@@ -20,11 +20,11 @@
    },
    "nixpkgs": {
      "locked": {
-        "lastModified": 1760533177,
-        "narHash": "sha256-OwM1sFustLHx+xmTymhucZuNhtq98fHIbfO8Swm5L8A=",
+        "lastModified": 1766840161,
+        "narHash": "sha256-Ss/LHpJJsng8vz1Pe33RSGIWUOcqM1fjrehjUkdrWio=",
        "owner": "NixOS",
        "repo": "nixpkgs",
-        "rev": "35f590344ff791e6b1d6d6b8f3523467c9217caf",
+        "rev": "3edc4a30ed3903fdf6f90c837f961fa6b49582d1",
        "type": "github"
      },
      "original": {
--- a/flake.nix
+++ b/flake.nix
@@ -229,7 +229,7 @@
        apps.default = apps.headscale;

        checks = {
-          headscale = pkgs.nixosTest (import ./nix/tests/headscale.nix);
+          headscale = pkgs.testers.nixosTest (import ./nix/tests/headscale.nix);
        };
      });
 }
--- a/gen/go/headscale/v1/apikey.pb.go
+++ b/gen/go/headscale/v1/apikey.pb.go
@@ -1,6 +1,6 @@
 // Code generated by protoc-gen-go. DO NOT EDIT.
 // versions:
-// 	protoc-gen-go v1.36.10
+// 	protoc-gen-go v1.36.11
 // 	protoc        (unknown)
 // source: headscale/v1/apikey.proto

--- a/gen/go/headscale/v1/device.pb.go
+++ b/gen/go/headscale/v1/device.pb.go
@@ -1,6 +1,6 @@
 // Code generated by protoc-gen-go. DO NOT EDIT.
 // versions:
-// 	protoc-gen-go v1.36.10
+// 	protoc-gen-go v1.36.11
 // 	protoc        (unknown)
 // source: headscale/v1/device.proto

--- a/gen/go/headscale/v1/headscale.pb.go
+++ b/gen/go/headscale/v1/headscale.pb.go
@@ -1,6 +1,6 @@
 // Code generated by protoc-gen-go. DO NOT EDIT.
 // versions:
-// 	protoc-gen-go v1.36.10
+// 	protoc-gen-go v1.36.11
 // 	protoc        (unknown)
 // source: headscale/v1/headscale.proto

--- a/gen/go/headscale/v1/headscale_grpc.pb.go
+++ b/gen/go/headscale/v1/headscale_grpc.pb.go
@@ -1,6 +1,6 @@
 // Code generated by protoc-gen-go-grpc. DO NOT EDIT.
 // versions:
-// - protoc-gen-go-grpc v1.5.1
+// - protoc-gen-go-grpc v1.6.0
 // - protoc             (unknown)
 // source: headscale/v1/headscale.proto

@@ -387,79 +387,79 @@ type HeadscaleServiceServer interface {
 type UnimplementedHeadscaleServiceServer struct{}

 func (UnimplementedHeadscaleServiceServer) CreateUser(context.Context, *CreateUserRequest) (*CreateUserResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method CreateUser not implemented")
+	return nil, status.Error(codes.Unimplemented, "method CreateUser not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) RenameUser(context.Context, *RenameUserRequest) (*RenameUserResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method RenameUser not implemented")
+	return nil, status.Error(codes.Unimplemented, "method RenameUser not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) DeleteUser(context.Context, *DeleteUserRequest) (*DeleteUserResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method DeleteUser not implemented")
+	return nil, status.Error(codes.Unimplemented, "method DeleteUser not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) ListUsers(context.Context, *ListUsersRequest) (*ListUsersResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method ListUsers not implemented")
+	return nil, status.Error(codes.Unimplemented, "method ListUsers not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) CreatePreAuthKey(context.Context, *CreatePreAuthKeyRequest) (*CreatePreAuthKeyResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method CreatePreAuthKey not implemented")
+	return nil, status.Error(codes.Unimplemented, "method CreatePreAuthKey not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) ExpirePreAuthKey(context.Context, *ExpirePreAuthKeyRequest) (*ExpirePreAuthKeyResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method ExpirePreAuthKey not implemented")
+	return nil, status.Error(codes.Unimplemented, "method ExpirePreAuthKey not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) DeletePreAuthKey(context.Context, *DeletePreAuthKeyRequest) (*DeletePreAuthKeyResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method DeletePreAuthKey not implemented")
+	return nil, status.Error(codes.Unimplemented, "method DeletePreAuthKey not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) ListPreAuthKeys(context.Context, *ListPreAuthKeysRequest) (*ListPreAuthKeysResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method ListPreAuthKeys not implemented")
+	return nil, status.Error(codes.Unimplemented, "method ListPreAuthKeys not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) DebugCreateNode(context.Context, *DebugCreateNodeRequest) (*DebugCreateNodeResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method DebugCreateNode not implemented")
+	return nil, status.Error(codes.Unimplemented, "method DebugCreateNode not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) GetNode(context.Context, *GetNodeRequest) (*GetNodeResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method GetNode not implemented")
+	return nil, status.Error(codes.Unimplemented, "method GetNode not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) SetTags(context.Context, *SetTagsRequest) (*SetTagsResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method SetTags not implemented")
+	return nil, status.Error(codes.Unimplemented, "method SetTags not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) SetApprovedRoutes(context.Context, *SetApprovedRoutesRequest) (*SetApprovedRoutesResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method SetApprovedRoutes not implemented")
+	return nil, status.Error(codes.Unimplemented, "method SetApprovedRoutes not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) RegisterNode(context.Context, *RegisterNodeRequest) (*RegisterNodeResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method RegisterNode not implemented")
+	return nil, status.Error(codes.Unimplemented, "method RegisterNode not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) DeleteNode(context.Context, *DeleteNodeRequest) (*DeleteNodeResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method DeleteNode not implemented")
+	return nil, status.Error(codes.Unimplemented, "method DeleteNode not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) ExpireNode(context.Context, *ExpireNodeRequest) (*ExpireNodeResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method ExpireNode not implemented")
+	return nil, status.Error(codes.Unimplemented, "method ExpireNode not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) RenameNode(context.Context, *RenameNodeRequest) (*RenameNodeResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method RenameNode not implemented")
+	return nil, status.Error(codes.Unimplemented, "method RenameNode not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) ListNodes(context.Context, *ListNodesRequest) (*ListNodesResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method ListNodes not implemented")
+	return nil, status.Error(codes.Unimplemented, "method ListNodes not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) BackfillNodeIPs(context.Context, *BackfillNodeIPsRequest) (*BackfillNodeIPsResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method BackfillNodeIPs not implemented")
+	return nil, status.Error(codes.Unimplemented, "method BackfillNodeIPs not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) CreateApiKey(context.Context, *CreateApiKeyRequest) (*CreateApiKeyResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method CreateApiKey not implemented")
+	return nil, status.Error(codes.Unimplemented, "method CreateApiKey not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) ExpireApiKey(context.Context, *ExpireApiKeyRequest) (*ExpireApiKeyResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method ExpireApiKey not implemented")
+	return nil, status.Error(codes.Unimplemented, "method ExpireApiKey not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) ListApiKeys(context.Context, *ListApiKeysRequest) (*ListApiKeysResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method ListApiKeys not implemented")
+	return nil, status.Error(codes.Unimplemented, "method ListApiKeys not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) DeleteApiKey(context.Context, *DeleteApiKeyRequest) (*DeleteApiKeyResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method DeleteApiKey not implemented")
+	return nil, status.Error(codes.Unimplemented, "method DeleteApiKey not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) GetPolicy(context.Context, *GetPolicyRequest) (*GetPolicyResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method GetPolicy not implemented")
+	return nil, status.Error(codes.Unimplemented, "method GetPolicy not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) SetPolicy(context.Context, *SetPolicyRequest) (*SetPolicyResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method SetPolicy not implemented")
+	return nil, status.Error(codes.Unimplemented, "method SetPolicy not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) Health(context.Context, *HealthRequest) (*HealthResponse, error) {
-	return nil, status.Errorf(codes.Unimplemented, "method Health not implemented")
+	return nil, status.Error(codes.Unimplemented, "method Health not implemented")
 }
 func (UnimplementedHeadscaleServiceServer) mustEmbedUnimplementedHeadscaleServiceServer() {}
 func (UnimplementedHeadscaleServiceServer) testEmbeddedByValue()                          {}
@@ -472,7 +472,7 @@ type UnsafeHeadscaleServiceServer interface {
 }

 func RegisterHeadscaleServiceServer(s grpc.ServiceRegistrar, srv HeadscaleServiceServer) {
-	// If the following call pancis, it indicates UnimplementedHeadscaleServiceServer was
+	// If the following call panics, it indicates UnimplementedHeadscaleServiceServer was
 	// embedded by pointer and is nil.  This will cause panics if an
 	// unimplemented method is ever invoked, so we test this at initialization
 	// time to prevent it from happening at runtime later due to I/O.
--- a/gen/go/headscale/v1/node.pb.go
+++ b/gen/go/headscale/v1/node.pb.go
@@ -1,6 +1,6 @@
 // Code generated by protoc-gen-go. DO NOT EDIT.
 // versions:
-// 	protoc-gen-go v1.36.10
+// 	protoc-gen-go v1.36.11
 // 	protoc        (unknown)
 // source: headscale/v1/node.proto

--- a/gen/go/headscale/v1/policy.pb.go
+++ b/gen/go/headscale/v1/policy.pb.go
@@ -1,6 +1,6 @@
 // Code generated by protoc-gen-go. DO NOT EDIT.
 // versions:
-// 	protoc-gen-go v1.36.10
+// 	protoc-gen-go v1.36.11
 // 	protoc        (unknown)
 // source: headscale/v1/policy.proto

--- a/gen/go/headscale/v1/preauthkey.pb.go
+++ b/gen/go/headscale/v1/preauthkey.pb.go
@@ -1,6 +1,6 @@
 // Code generated by protoc-gen-go. DO NOT EDIT.
 // versions:
-// 	protoc-gen-go v1.36.10
+// 	protoc-gen-go v1.36.11
 // 	protoc        (unknown)
 // source: headscale/v1/preauthkey.proto

--- a/gen/go/headscale/v1/user.pb.go
+++ b/gen/go/headscale/v1/user.pb.go
@@ -1,6 +1,6 @@
 // Code generated by protoc-gen-go. DO NOT EDIT.
 // versions:
-// 	protoc-gen-go v1.36.10
+// 	protoc-gen-go v1.36.11
 // 	protoc        (unknown)
 // source: headscale/v1/user.proto

--- a/hscontrol/auth.go
+++ b/hscontrol/auth.go
@@ -247,9 +247,9 @@ func nodeToRegisterResponse(node types.NodeView) *tailcfg.RegisterResponse {
 	if node.IsTagged() {
 		resp.User = types.TaggedDevices.View().TailscaleUser()
 		resp.Login = types.TaggedDevices.View().TailscaleLogin()
-	} else if node.UserView().Valid() {
-		resp.User = node.UserView().TailscaleUser()
-		resp.Login = node.UserView().TailscaleLogin()
+	} else if node.Owner().Valid() {
+		resp.User = node.Owner().TailscaleUser()
+		resp.Login = node.Owner().TailscaleLogin()
 	}

 	return resp
@@ -389,8 +389,8 @@ func (h *Headscale) handleRegisterWithAuthKey(
 	resp := &tailcfg.RegisterResponse{
 		MachineAuthorized: true,
 		NodeKeyExpired:    node.IsExpired(),
-		User:              node.UserView().TailscaleUser(),
-		Login:             node.UserView().TailscaleLogin(),
+		User:              node.Owner().TailscaleUser(),
+		Login:             node.Owner().TailscaleLogin(),
 	}

 	log.Trace().
--- a/hscontrol/db/sqliteconfig/config.go
+++ b/hscontrol/db/sqliteconfig/config.go
@@ -16,6 +16,7 @@ var (
 	ErrInvalidAutoVacuum   = errors.New("invalid auto_vacuum")
 	ErrWALAutocheckpoint   = errors.New("wal_autocheckpoint must be >= -1")
 	ErrInvalidSynchronous  = errors.New("invalid synchronous")
+	ErrInvalidTxLock       = errors.New("invalid txlock")
 )

 const (
@@ -225,6 +226,62 @@ func (s Synchronous) String() string {
 	return string(s)
 }

+// TxLock represents SQLite transaction lock mode.
+// Transaction lock mode determines when write locks are acquired during transactions.
+//
+// Lock Acquisition Behavior:
+//
+// DEFERRED - SQLite default, acquire lock lazily:
+//   - Transaction starts without any lock
+//   - First read acquires SHARED lock
+//   - First write attempts to upgrade to RESERVED lock
+//   - If another transaction holds RESERVED: SQLITE_BUSY (potential deadlock)
+//   - Can cause deadlocks when multiple connections attempt concurrent writes
+//
+// IMMEDIATE - Recommended for write-heavy workloads:
+//   - Transaction immediately acquires RESERVED lock at BEGIN
+//   - If lock unavailable, waits up to busy_timeout before failing
+//   - Other writers queue orderly instead of deadlocking
+//   - Prevents the upgrade-lock deadlock scenario
+//   - Slight overhead for read-only transactions that don't need locks
+//
+// EXCLUSIVE - Maximum isolation:
+//   - Transaction immediately acquires EXCLUSIVE lock at BEGIN
+//   - No other connections can read or write
+//   - Highest isolation but lowest concurrency
+//   - Rarely needed in practice
+type TxLock string
+
+const (
+	// TxLockDeferred acquires locks lazily (SQLite default).
+	// Risk of SQLITE_BUSY deadlocks with concurrent writers. Use for read-heavy workloads.
+	TxLockDeferred TxLock = "deferred"
+
+	// TxLockImmediate acquires write lock immediately (RECOMMENDED for production).
+	// Prevents deadlocks by acquiring RESERVED lock at transaction start.
+	// Writers queue orderly, respecting busy_timeout.
+	TxLockImmediate TxLock = "immediate"
+
+	// TxLockExclusive acquires exclusive lock immediately.
+	// Maximum isolation, no concurrent reads or writes. Rarely needed.
+	TxLockExclusive TxLock = "exclusive"
+)
+
+// IsValid returns true if the TxLock is valid.
+func (t TxLock) IsValid() bool {
+	switch t {
+	case TxLockDeferred, TxLockImmediate, TxLockExclusive, "":
+		return true
+	default:
+		return false
+	}
+}
+
+// String returns the string representation.
+func (t TxLock) String() string {
+	return string(t)
+}
+
 // Config holds SQLite database configuration with type-safe enums.
 // This configuration balances performance, durability, and operational requirements
 // for Headscale's SQLite database usage patterns.
@@ -236,6 +293,7 @@ type Config struct {
 	WALAutocheckpoint int         // pages (-1 = default/not set, 0 = disabled, >0 = enabled)
 	Synchronous       Synchronous // synchronous mode (affects durability vs performance)
 	ForeignKeys       bool        // enable foreign key constraints (data integrity)
+	TxLock            TxLock      // transaction lock mode (affects write concurrency)
 }

 // Default returns the production configuration optimized for Headscale's usage patterns.
@@ -244,6 +302,7 @@ type Config struct {
 //   - Data durability with good performance (NORMAL synchronous)
 //   - Automatic space management (INCREMENTAL auto-vacuum)
 //   - Data integrity (foreign key constraints enabled)
+//   - Safe concurrent writes (IMMEDIATE transaction lock)
 //   - Reasonable timeout for busy database scenarios (10s)
 func Default(path string) *Config {
 	return &Config{
@@ -254,6 +313,7 @@ func Default(path string) *Config {
 		WALAutocheckpoint: 1000,
 		Synchronous:       SynchronousNormal,
 		ForeignKeys:       true,
+		TxLock:            TxLockImmediate,
 	}
 }

@@ -292,6 +352,10 @@ func (c *Config) Validate() error {
 		return fmt.Errorf("%w: %s", ErrInvalidSynchronous, c.Synchronous)
 	}

+	if c.TxLock != "" && !c.TxLock.IsValid() {
+		return fmt.Errorf("%w: %s", ErrInvalidTxLock, c.TxLock)
+	}
+
 	return nil
 }

@@ -332,12 +396,20 @@ func (c *Config) ToURL() (string, error) {
 		baseURL = "file:" + c.Path
 	}

-	// Add parameters without encoding = signs
-	if len(pragmas) > 0 {
-		var queryParts []string
-		for _, pragma := range pragmas {
-			queryParts = append(queryParts, "_pragma="+pragma)
-		}
+	// Build query parameters
+	queryParts := make([]string, 0, 1+len(pragmas))
+
+	// Add _txlock first (it's a connection parameter, not a pragma)
+	if c.TxLock != "" {
+		queryParts = append(queryParts, "_txlock="+string(c.TxLock))
+	}
+
+	// Add pragma parameters
+	for _, pragma := range pragmas {
+		queryParts = append(queryParts, "_pragma="+pragma)
+	}
+
+	if len(queryParts) > 0 {
 		baseURL += "?" + strings.Join(queryParts, "&")
 	}

--- a/hscontrol/db/sqliteconfig/config_test.go
+++ b/hscontrol/db/sqliteconfig/config_test.go
@@ -71,6 +71,52 @@ func TestSynchronous(t *testing.T) {
 	}
 }

+func TestTxLock(t *testing.T) {
+	tests := []struct {
+		mode  TxLock
+		valid bool
+	}{
+		{TxLockDeferred, true},
+		{TxLockImmediate, true},
+		{TxLockExclusive, true},
+		{TxLock(""), true},           // empty is valid (uses driver default)
+		{TxLock("IMMEDIATE"), false}, // uppercase is invalid
+		{TxLock("INVALID"), false},
+	}
+
+	for _, tt := range tests {
+		name := string(tt.mode)
+		if name == "" {
+			name = "empty"
+		}
+
+		t.Run(name, func(t *testing.T) {
+			if got := tt.mode.IsValid(); got != tt.valid {
+				t.Errorf("TxLock(%q).IsValid() = %v, want %v", tt.mode, got, tt.valid)
+			}
+		})
+	}
+}
+
+func TestTxLockString(t *testing.T) {
+	tests := []struct {
+		mode TxLock
+		want string
+	}{
+		{TxLockDeferred, "deferred"},
+		{TxLockImmediate, "immediate"},
+		{TxLockExclusive, "exclusive"},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.want, func(t *testing.T) {
+			if got := tt.mode.String(); got != tt.want {
+				t.Errorf("TxLock.String() = %q, want %q", got, tt.want)
+			}
+		})
+	}
+}
+
 func TestConfigValidate(t *testing.T) {
 	tests := []struct {
 		name    string
@@ -104,6 +150,21 @@ func TestConfigValidate(t *testing.T) {
 			},
 			wantErr: true,
 		},
+		{
+			name: "invalid txlock",
+			config: &Config{
+				Path:   "/path/to/db.sqlite",
+				TxLock: TxLock("INVALID"),
+			},
+			wantErr: true,
+		},
+		{
+			name: "valid txlock immediate",
+			config: &Config{
+				Path:   "/path/to/db.sqlite",
+				TxLock: TxLockImmediate,
+			},
+		},
 	}

 	for _, tt := range tests {
@@ -123,9 +184,9 @@ func TestConfigToURL(t *testing.T) {
 		want   string
 	}{
 		{
-			name:   "default config",
+			name:   "default config includes txlock immediate",
 			config: Default("/path/to/db.sqlite"),
-			want:   "file:/path/to/db.sqlite?_pragma=busy_timeout=10000&_pragma=journal_mode=WAL&_pragma=auto_vacuum=INCREMENTAL&_pragma=wal_autocheckpoint=1000&_pragma=synchronous=NORMAL&_pragma=foreign_keys=ON",
+			want:   "file:/path/to/db.sqlite?_txlock=immediate&_pragma=busy_timeout=10000&_pragma=journal_mode=WAL&_pragma=auto_vacuum=INCREMENTAL&_pragma=wal_autocheckpoint=1000&_pragma=synchronous=NORMAL&_pragma=foreign_keys=ON",
 		},
 		{
 			name:   "memory config",
@@ -183,6 +244,47 @@ func TestConfigToURL(t *testing.T) {
 			},
 			want: "file:/full.db?_pragma=busy_timeout=15000&_pragma=journal_mode=WAL&_pragma=auto_vacuum=FULL&_pragma=wal_autocheckpoint=1000&_pragma=synchronous=EXTRA&_pragma=foreign_keys=ON",
 		},
+		{
+			name: "with txlock immediate",
+			config: &Config{
+				Path:              "/test.db",
+				BusyTimeout:       5000,
+				TxLock:            TxLockImmediate,
+				WALAutocheckpoint: -1,
+				ForeignKeys:       true,
+			},
+			want: "file:/test.db?_txlock=immediate&_pragma=busy_timeout=5000&_pragma=foreign_keys=ON",
+		},
+		{
+			name: "with txlock deferred",
+			config: &Config{
+				Path:              "/test.db",
+				TxLock:            TxLockDeferred,
+				WALAutocheckpoint: -1,
+				ForeignKeys:       true,
+			},
+			want: "file:/test.db?_txlock=deferred&_pragma=foreign_keys=ON",
+		},
+		{
+			name: "with txlock exclusive",
+			config: &Config{
+				Path:              "/test.db",
+				TxLock:            TxLockExclusive,
+				WALAutocheckpoint: -1,
+			},
+			want: "file:/test.db?_txlock=exclusive",
+		},
+		{
+			name: "empty txlock omitted from URL",
+			config: &Config{
+				Path:              "/test.db",
+				TxLock:            "",
+				BusyTimeout:       1000,
+				WALAutocheckpoint: -1,
+				ForeignKeys:       true,
+			},
+			want: "file:/test.db?_pragma=busy_timeout=1000&_pragma=foreign_keys=ON",
+		},
 	}

 	for _, tt := range tests {
@@ -209,3 +311,10 @@ func TestConfigToURLInvalid(t *testing.T) {
 		t.Error("Config.ToURL() with invalid config should return error")
 	}
 }
+
+func TestDefaultConfigHasTxLockImmediate(t *testing.T) {
+	config := Default("/test.db")
+	if config.TxLock != TxLockImmediate {
+		t.Errorf("Default().TxLock = %q, want %q", config.TxLock, TxLockImmediate)
+	}
+}
--- a/hscontrol/mapper/mapper.go
+++ b/hscontrol/mapper/mapper.go
@@ -69,18 +69,19 @@ func newMapper(
 	}
 }

+// generateUserProfiles creates user profiles for MapResponse.
 func generateUserProfiles(
 	node types.NodeView,
 	peers views.Slice[types.NodeView],
 ) []tailcfg.UserProfile {
 	userMap := make(map[uint]*types.UserView)
 	ids := make([]uint, 0, len(userMap))
-	user := node.User()
+	user := node.Owner()
 	userID := user.Model().ID
 	userMap[userID] = &user
 	ids = append(ids, userID)
 	for _, peer := range peers.All() {
-		peerUser := peer.User()
+		peerUser := peer.Owner()
 		peerUserID := peerUser.Model().ID
 		userMap[peerUserID] = &peerUser
 		ids = append(ids, peerUserID)
--- a/hscontrol/state/debug.go
+++ b/hscontrol/state/debug.go
@@ -78,7 +78,7 @@ func (s *State) DebugOverview() string {
 	now := time.Now()
 	for _, node := range allNodes.All() {
 		if node.Valid() {
-			userName := node.User().Name()
+			userName := node.Owner().Name()
 			userNodeCounts[userName]++

 			if node.IsOnline().Valid() && node.IsOnline().Get() {
@@ -281,7 +281,7 @@ func (s *State) DebugOverviewJSON() DebugOverviewInfo {

 	for _, node := range allNodes.All() {
 		if node.Valid() {
-			userName := node.User().Name()
+			userName := node.Owner().Name()
 			info.Users[userName]++

 			if node.IsOnline().Valid() && node.IsOnline().Get() {
--- a/hscontrol/state/node_store.go
+++ b/hscontrol/state/node_store.go
@@ -509,15 +509,27 @@ func (s *NodeStore) DebugString() string {
 	sb.WriteString(fmt.Sprintf("Users with Nodes: %d\n", len(snapshot.nodesByUser)))
 	sb.WriteString("\n")

-	// User distribution
-	sb.WriteString("Nodes by User:\n")
+	// User distribution (shows internal UserID tracking, not display owner)
+	sb.WriteString("Nodes by Internal User ID:\n")
 	for userID, nodes := range snapshot.nodesByUser {
 		if len(nodes) > 0 {
 			userName := "unknown"
+			taggedCount := 0
 			if len(nodes) > 0 && nodes[0].Valid() {
 				userName = nodes[0].User().Name()
+				// Count tagged nodes (which have UserID set but are owned by "tagged-devices")
+				for _, n := range nodes {
+					if n.IsTagged() {
+						taggedCount++
+					}
+				}
+			}
+
+			if taggedCount > 0 {
+				sb.WriteString(fmt.Sprintf("  - User %d (%s): %d nodes (%d tagged)\n", userID, userName, len(nodes), taggedCount))
+			} else {
+				sb.WriteString(fmt.Sprintf("  - User %d (%s): %d nodes\n", userID, userName, len(nodes)))
 			}
-			sb.WriteString(fmt.Sprintf("  - User %d (%s): %d nodes\n", userID, userName, len(nodes)))
 		}
 	}
 	sb.WriteString("\n")
--- a/hscontrol/types/node.go
+++ b/hscontrol/types/node.go
@@ -719,7 +719,13 @@ func (node Node) DebugString() string {
 	return sb.String()
 }

-func (nv NodeView) UserView() UserView {
+// Owner returns the owner for display purposes.
+// For tagged nodes, returns TaggedDevices. For user-owned nodes, returns the user.
+func (nv NodeView) Owner() UserView {
+	if nv.IsTagged() {
+		return TaggedDevices.View()
+	}
+
 	return nv.User()
 }

--- a/integration/cli_test.go
+++ b/integration/cli_test.go
@@ -659,6 +659,123 @@ func TestPreAuthKeyCorrectUserLoggedInCommand(t *testing.T) {
 	}, 20*time.Second, 1*time.Second)
 }

+func TestTaggedNodesCLIOutput(t *testing.T) {
+	IntegrationSkip(t)
+
+	user1 := "user1"
+	user2 := "user2"
+
+	spec := ScenarioSpec{
+		NodesPerUser: 1,
+		Users:        []string{user1},
+	}
+
+	scenario, err := NewScenario(spec)
+
+	require.NoError(t, err)
+	defer scenario.ShutdownAssertNoPanics(t)
+
+	err = scenario.CreateHeadscaleEnv(
+		[]tsic.Option{},
+		hsic.WithTestName("tagcli"),
+		hsic.WithEmbeddedDERPServerOnly(),
+		hsic.WithTLS(),
+	)
+	require.NoError(t, err)
+
+	headscale, err := scenario.Headscale()
+	require.NoError(t, err)
+
+	u2, err := headscale.CreateUser(user2)
+	require.NoError(t, err)
+
+	var user2Key v1.PreAuthKey
+
+	// Create a tagged PreAuthKey for user2
+	assert.EventuallyWithT(t, func(c *assert.CollectT) {
+		err = executeAndUnmarshal(
+			headscale,
+			[]string{
+				"headscale",
+				"preauthkeys",
+				"--user",
+				strconv.FormatUint(u2.GetId(), 10),
+				"create",
+				"--reusable",
+				"--expiration",
+				"24h",
+				"--output",
+				"json",
+				"--tags",
+				"tag:test1,tag:test2",
+			},
+			&user2Key,
+		)
+		assert.NoError(c, err)
+	}, 10*time.Second, 200*time.Millisecond, "Waiting for user2 tagged preauth key creation")
+
+	allClients, err := scenario.ListTailscaleClients()
+	requireNoErrListClients(t, err)
+
+	require.Len(t, allClients, 1)
+
+	client := allClients[0]
+
+	// Log out from user1
+	err = client.Logout()
+	require.NoError(t, err)
+
+	err = scenario.WaitForTailscaleLogout()
+	require.NoError(t, err)
+
+	assert.EventuallyWithT(t, func(ct *assert.CollectT) {
+		status, err := client.Status()
+		assert.NoError(ct, err)
+		assert.NotContains(ct, []string{"Starting", "Running"}, status.BackendState,
+			"Expected node to be logged out, backend state: %s", status.BackendState)
+	}, 30*time.Second, 2*time.Second)
+
+	// Log in with the tagged PreAuthKey (from user2, with tags)
+	err = client.Login(headscale.GetEndpoint(), user2Key.GetKey())
+	require.NoError(t, err)
+
+	assert.EventuallyWithT(t, func(ct *assert.CollectT) {
+		status, err := client.Status()
+		assert.NoError(ct, err)
+		assert.Equal(ct, "Running", status.BackendState, "Expected node to be logged in, backend state: %s", status.BackendState)
+		// With tags-as-identity model, tagged nodes show as TaggedDevices user (2147455555)
+		assert.Equal(ct, "userid:2147455555", status.Self.UserID.String(), "Expected node to be logged in as tagged-devices user")
+	}, 30*time.Second, 2*time.Second)
+
+	// Wait for the second node to appear
+	var listNodes []*v1.Node
+
+	assert.EventuallyWithT(t, func(ct *assert.CollectT) {
+		var err error
+
+		listNodes, err = headscale.ListNodes()
+		assert.NoError(ct, err)
+		assert.Len(ct, listNodes, 2, "Should have 2 nodes after re-login with tagged key")
+		assert.Equal(ct, user1, listNodes[0].GetUser().GetName(), "First node should belong to user1")
+		assert.Equal(ct, "tagged-devices", listNodes[1].GetUser().GetName(), "Second node should be tagged-devices")
+	}, 20*time.Second, 1*time.Second)
+
+	// Test: tailscale status output should show "tagged-devices" not "userid:2147455555"
+	// This is the fix for issue #2970 - the Tailscale client should display user-friendly names
+	assert.EventuallyWithT(t, func(ct *assert.CollectT) {
+		stdout, stderr, err := client.Execute([]string{"tailscale", "status"})
+		assert.NoError(ct, err, "tailscale status command should succeed, stderr: %s", stderr)
+
+		t.Logf("Tailscale status output:\n%s", stdout)
+
+		// The output should contain "tagged-devices" for tagged nodes
+		assert.Contains(ct, stdout, "tagged-devices", "Tailscale status should show 'tagged-devices' for tagged nodes")
+
+		// The output should NOT show the raw numeric userid to the user
+		assert.NotContains(ct, stdout, "userid:2147455555", "Tailscale status should not show numeric userid for tagged nodes")
+	}, 20*time.Second, 1*time.Second)
+}
+
 func TestApiKeyCommand(t *testing.T) {
 	IntegrationSkip(t)

--- a/integration/dsic/dsic.go
+++ b/integration/dsic/dsic.go
@@ -147,7 +147,18 @@ func New(
 		return nil, err
 	}

-	hostname := fmt.Sprintf("derp-%s-%s", strings.ReplaceAll(version, ".", "-"), hash)
+	// Include run ID in hostname for easier identification of which test run owns this container
+	runID := dockertestutil.GetIntegrationRunID()
+
+	var hostname string
+
+	if runID != "" {
+		// Use last 6 chars of run ID (the random hash part) for brevity
+		runIDShort := runID[len(runID)-6:]
+		hostname = fmt.Sprintf("derp-%s-%s-%s", runIDShort, strings.ReplaceAll(version, ".", "-"), hash)
+	} else {
+		hostname = fmt.Sprintf("derp-%s-%s", strings.ReplaceAll(version, ".", "-"), hash)
+	}
 	tlsCert, tlsKey, err := integrationutil.CreateCertificate(hostname)
 	if err != nil {
 		return nil, fmt.Errorf("failed to create certificates for headscale test: %w", err)
--- a/integration/hsic/hsic.go
+++ b/integration/hsic/hsic.go
@@ -74,6 +74,7 @@ type HeadscaleInContainer struct {
 	// optional config
 	port             int
 	extraPorts       []string
+	hostMetricsPort  string // Dynamically assigned host port for metrics/pprof access
 	caCerts          [][]byte
 	hostPortBindings map[string][]string
 	aclPolicy        *policyv2.Policy
@@ -330,7 +331,18 @@ func New(
 		return nil, err
 	}

-	hostname := "hs-" + hash
+	// Include run ID in hostname for easier identification of which test run owns this container
+	runID := dockertestutil.GetIntegrationRunID()
+
+	var hostname string
+
+	if runID != "" {
+		// Use last 6 chars of run ID (the random hash part) for brevity
+		runIDShort := runID[len(runID)-6:]
+		hostname = fmt.Sprintf("hs-%s-%s", runIDShort, hash)
+	} else {
+		hostname = "hs-" + hash
+	}

 	hsic := &HeadscaleInContainer{
 		hostname: hostname,
@@ -438,13 +450,13 @@ func New(
 		Env:        env,
 	}

-	// Bind metrics port to predictable host port
+	// Bind metrics port to dynamic host port (kernel assigns free port)
 	if runOptions.PortBindings == nil {
 		runOptions.PortBindings = map[docker.Port][]docker.PortBinding{}
 	}

 	runOptions.PortBindings["9090/tcp"] = []docker.PortBinding{
-		{HostPort: "49090"},
+		{HostPort: "0"}, // Let kernel assign a free port
 	}

 	if len(hsic.hostPortBindings) > 0 {
@@ -540,9 +552,14 @@ func New(

 	hsic.container = container

+	// Get the dynamically assigned host port for metrics/pprof
+	hsic.hostMetricsPort = container.GetHostPort("9090/tcp")
+
 	log.Printf(
-		"Ports for %s: metrics/pprof=49090\n",
+		"Headscale %s metrics available at http://localhost:%s/metrics (debug at http://localhost:%s/debug/)\n",
 		hsic.hostname,
+		hsic.hostMetricsPort,
+		hsic.hostMetricsPort,
 	)

 	// Write the CA certificates to the container
@@ -932,6 +949,13 @@ func (t *HeadscaleInContainer) GetPort() string {
 	return strconv.Itoa(t.port)
 }

+// GetHostMetricsPort returns the dynamically assigned host port for metrics/pprof access.
+// This port can be used by operators to access metrics at http://localhost:{port}/metrics
+// and debug endpoints at http://localhost:{port}/debug/ while tests are running.
+func (t *HeadscaleInContainer) GetHostMetricsPort() string {
+	return t.hostMetricsPort
+}
+
 // GetHealthEndpoint returns a health endpoint for the HeadscaleInContainer
 // instance.
 func (t *HeadscaleInContainer) GetHealthEndpoint() string {
--- a/integration/scenario.go
+++ b/integration/scenario.go
@@ -247,9 +247,14 @@ func (s *Scenario) AddNetwork(name string) (*dockertest.Network, error) {

 	// We run the test suite in a docker container that calls a couple of endpoints for
 	// readiness checks, this ensures that we can run the tests with individual networks
-	// and have the client reach the different containers
-	// TODO(kradalby): Can the test-suite be renamed so we can have multiple?
-	err = dockertestutil.AddContainerToNetwork(s.pool, network, "headscale-test-suite")
+	// and have the client reach the different containers.
+	// The container name includes the run ID to support multiple concurrent test runs.
+	testSuiteName := "headscale-test-suite"
+	if runID := dockertestutil.GetIntegrationRunID(); runID != "" {
+		testSuiteName = "headscale-test-suite-" + runID
+	}
+
+	err = dockertestutil.AddContainerToNetwork(s.pool, network, testSuiteName)
 	if err != nil {
 		return nil, fmt.Errorf("failed to add test suite container to network: %w", err)
 	}
--- a/integration/tsic/tsic.go
+++ b/integration/tsic/tsic.go
@@ -307,7 +307,18 @@ func New(
 		return nil, err
 	}

-	hostname := fmt.Sprintf("ts-%s-%s", strings.ReplaceAll(version, ".", "-"), hash)
+	// Include run ID in hostname for easier identification of which test run owns this container
+	runID := dockertestutil.GetIntegrationRunID()
+
+	var hostname string
+
+	if runID != "" {
+		// Use last 6 chars of run ID (the random hash part) for brevity
+		runIDShort := runID[len(runID)-6:]
+		hostname = fmt.Sprintf("ts-%s-%s-%s", runIDShort, strings.ReplaceAll(version, ".", "-"), hash)
+	} else {
+		hostname = fmt.Sprintf("ts-%s-%s", strings.ReplaceAll(version, ".", "-"), hash)
+	}

 	tsic := &TailscaleInContainer{
 		version:  version,
Author	SHA1	Message	Date
Kristoffer Dalby	72fcb93ef3	cli: ensure tagged-devices is included in profile list (#2991 )	2026-01-09 16:31:23 +01:00
Kristoffer Dalby	f5c779626a	nix: use testers.nixosTest instead of nixosTest nixosTest was renamed to testers.nixosTest in nixpkgs.	2026-01-09 12:57:32 +01:00
Kristoffer Dalby	d227b3a135	docs: update integration testing docs for concurrent execution Update documentation to reflect the new concurrent test execution capabilities and add guidance on run ID isolation. AGENTS.md: - Add examples for running multiple tests concurrently - Document run ID format and container naming conventions - Update "Critical Notes" to explain isolation mechanisms .claude/agents/headscale-integration-tester.md: - Add "Concurrent Execution and Run ID Isolation" section - Document forbidden and safe operations for cleanup - Add "Agent Session Isolation Rules" for multi-agent environments - Add 6th core responsibility about concurrent execution awareness - Add ISOLATION PRINCIPLE to critical principles - Update pre-test cleanup documentation	2026-01-09 12:34:16 +01:00
Kristoffer Dalby	0bcfdc29ad	cmd/hi: enable concurrent test execution Remove the concurrent test prevention logic and update cleanup to use run ID-based isolation, allowing multiple tests to run simultaneously. Changes: - cleanup: Add killTestContainersByRunID() to clean only containers belonging to a specific run, add cleanupStaleTestContainers() to remove only stopped/exited containers without affecting running tests - docker: Remove RunningTestInfo, checkForRunningTests(), and related error types, update cleanupAfterTest() to use run ID-based cleanup - run: Remove Force flag and concurrent test prevention check The test runner now: - Allows multiple concurrent test runs on the same Docker daemon - Cleans only stale containers before tests (not running ones) - Cleans only containers with matching run ID after tests - Prints run ID and monitoring info for operator visibility	2026-01-09 12:34:16 +01:00
Kristoffer Dalby	87c230d251	integration: add run ID isolation for concurrent test execution Add run ID-based isolation to container naming and network setup to enable multiple integration tests to run concurrently on the same Docker daemon without conflicts. Changes: - hsic: Add run ID prefix to headscale container names and use dynamic port allocation for metrics endpoint (port 0 lets kernel assign) - tsic: Add run ID prefix to tailscale container names - dsic: Add run ID prefix to DERP container names - scenario: Use run ID-aware test suite container name for network setup Container naming now follows: {type}-{runIDShort}-{identifier}-{hash} Example: ts-mdjtzx-1-74-fgdyls, hs-mdjtzx-pingallbyip-abc123 The run ID is obtained from HEADSCALE_INTEGRATION_RUN_ID environment variable via dockertestutil.GetIntegrationRunID().	2026-01-09 12:34:16 +01:00
github-actions[bot]	84c092a9f9	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/35f5903' (2025-10-15) → 'github:NixOS/nixpkgs/3edc4a3' (2025-12-27)	2025-12-28 08:25:28 +01:00
Florian Preinstorfer	9146140217	Add headscale-operator Ref: #1523	2025-12-23 20:22:57 +01:00
Kristoffer Dalby	5103b35f3c	sqliteconfig: add config opt for tx locking Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>	2025-12-22 14:01:40 +01:00