Compare commits

...

8 Commits

Author SHA1 Message Date
Kristoffer Dalby
72fcb93ef3 cli: ensure tagged-devices is included in profile list (#2991) 2026-01-09 16:31:23 +01:00
Kristoffer Dalby
f5c779626a nix: use testers.nixosTest instead of nixosTest
nixosTest was renamed to testers.nixosTest in nixpkgs.
2026-01-09 12:57:32 +01:00
Kristoffer Dalby
d227b3a135 docs: update integration testing docs for concurrent execution
Update documentation to reflect the new concurrent test execution
capabilities and add guidance on run ID isolation.

AGENTS.md:
- Add examples for running multiple tests concurrently
- Document run ID format and container naming conventions
- Update "Critical Notes" to explain isolation mechanisms

.claude/agents/headscale-integration-tester.md:
- Add "Concurrent Execution and Run ID Isolation" section
- Document forbidden and safe operations for cleanup
- Add "Agent Session Isolation Rules" for multi-agent environments
- Add 6th core responsibility about concurrent execution awareness
- Add ISOLATION PRINCIPLE to critical principles
- Update pre-test cleanup documentation
2026-01-09 12:34:16 +01:00
Kristoffer Dalby
0bcfdc29ad cmd/hi: enable concurrent test execution
Remove the concurrent test prevention logic and update cleanup to use
run ID-based isolation, allowing multiple tests to run simultaneously.

Changes:
- cleanup: Add killTestContainersByRunID() to clean only containers
  belonging to a specific run, add cleanupStaleTestContainers() to
  remove only stopped/exited containers without affecting running tests
- docker: Remove RunningTestInfo, checkForRunningTests(), and related
  error types, update cleanupAfterTest() to use run ID-based cleanup
- run: Remove Force flag and concurrent test prevention check

The test runner now:
- Allows multiple concurrent test runs on the same Docker daemon
- Cleans only stale containers before tests (not running ones)
- Cleans only containers with matching run ID after tests
- Prints run ID and monitoring info for operator visibility
2026-01-09 12:34:16 +01:00
Kristoffer Dalby
87c230d251 integration: add run ID isolation for concurrent test execution
Add run ID-based isolation to container naming and network setup to
enable multiple integration tests to run concurrently on the same
Docker daemon without conflicts.

Changes:
- hsic: Add run ID prefix to headscale container names and use dynamic
  port allocation for metrics endpoint (port 0 lets kernel assign)
- tsic: Add run ID prefix to tailscale container names
- dsic: Add run ID prefix to DERP container names
- scenario: Use run ID-aware test suite container name for network setup

Container naming now follows: {type}-{runIDShort}-{identifier}-{hash}
Example: ts-mdjtzx-1-74-fgdyls, hs-mdjtzx-pingallbyip-abc123

The run ID is obtained from HEADSCALE_INTEGRATION_RUN_ID environment
variable via dockertestutil.GetIntegrationRunID().
2026-01-09 12:34:16 +01:00
github-actions[bot]
84c092a9f9 flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/35f5903' (2025-10-15)
  → 'github:NixOS/nixpkgs/3edc4a3' (2025-12-27)
2025-12-28 08:25:28 +01:00
Florian Preinstorfer
9146140217 Add headscale-operator
Ref: #1523
2025-12-23 20:22:57 +01:00
Kristoffer Dalby
5103b35f3c sqliteconfig: add config opt for tx locking
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-22 14:01:40 +01:00
29 changed files with 697 additions and 244 deletions

View File

@@ -71,7 +71,7 @@ go run ./cmd/hi run "TestName" --timeout=60s
- **Slow tests** (5+ min): Node expiration, HA failover
- **Long-running tests** (10+ min): `TestNodeOnlineStatus` runs for 12 minutes
**CRITICAL**: Only ONE test can run at a time due to Docker port conflicts and resource constraints.
**CONCURRENT EXECUTION**: Multiple tests CAN run simultaneously. Each test run gets a unique Run ID for isolation. See "Concurrent Execution and Run ID Isolation" section below.
## Test Artifacts and Log Analysis
@@ -98,6 +98,97 @@ When tests fail, examine artifacts in this specific order:
4. **Client status dumps** (`*_status.json`): Network state and peer connectivity information
5. **Database snapshots** (`.db` files): For data consistency and state persistence issues
## Concurrent Execution and Run ID Isolation
### Overview
The integration test system supports running multiple tests concurrently on the same Docker daemon. Each test run is isolated through a unique Run ID that ensures containers, networks, and cleanup operations don't interfere with each other.
### Run ID Format and Usage
Each test run generates a unique Run ID in the format: `YYYYMMDD-HHMMSS-{6-char-hash}`
- Example: `20260109-104215-mdjtzx`
The Run ID is used for:
- **Container naming**: `ts-{runIDShort}-{version}-{hash}` (e.g., `ts-mdjtzx-1-74-fgdyls`)
- **Docker labels**: All containers get `hi.run-id={runID}` label
- **Log directories**: `control_logs/{runID}/`
- **Cleanup isolation**: Only containers with matching run ID are cleaned up
### Container Isolation Mechanisms
1. **Unique Container Names**: Each container includes the run ID for identification
2. **Docker Labels**: `hi.run-id` and `hi.test-type` labels on all containers
3. **Dynamic Port Allocation**: All ports use `{HostPort: "0"}` to let kernel assign free ports
4. **Per-Run Networks**: Network names include scenario hash for isolation
5. **Isolated Cleanup**: `killTestContainersByRunID()` only removes containers matching the run ID
### ⚠️ CRITICAL: Never Interfere with Other Test Runs
**FORBIDDEN OPERATIONS** when other tests may be running:
```bash
# ❌ NEVER do global container cleanup while tests are running
docker rm -f $(docker ps -q --filter "name=hs-")
docker rm -f $(docker ps -q --filter "name=ts-")
# ❌ NEVER kill all test containers
# This will destroy other agents' test sessions!
# ❌ NEVER prune all Docker resources during active tests
docker system prune -f # Only safe when NO tests are running
```
**SAFE OPERATIONS**:
```bash
# ✅ Clean up only YOUR test run's containers (by run ID)
# The test runner does this automatically via cleanup functions
# ✅ Clean stale (stopped/exited) containers only
# Pre-test cleanup only removes stopped containers, not running ones
# ✅ Check what's running before cleanup
docker ps --filter "name=headscale-test-suite" --format "{{.Names}}"
```
### Running Concurrent Tests
```bash
# Start multiple tests in parallel - each gets unique run ID
go run ./cmd/hi run "TestPingAllByIP" &
go run ./cmd/hi run "TestACLAllowUserDst" &
go run ./cmd/hi run "TestOIDCAuthenticationPingAll" &
# Monitor running test suites
docker ps --filter "name=headscale-test-suite" --format "table {{.Names}}\t{{.Status}}"
```
### Agent Session Isolation Rules
When working as an agent:
1. **Your run ID is unique**: Each test you start gets its own run ID
2. **Never clean up globally**: Only use run ID-specific cleanup
3. **Check before cleanup**: Verify no other tests are running if you need to prune resources
4. **Respect other sessions**: Other agents may have tests running concurrently
5. **Log directories are isolated**: Your artifacts are in `control_logs/{your-run-id}/`
### Identifying Your Containers
Your test containers can be identified by:
- The run ID in the container name
- The `hi.run-id` Docker label
- The test suite container: `headscale-test-suite-{your-run-id}`
```bash
# List containers for a specific run ID
docker ps --filter "label=hi.run-id=20260109-104215-mdjtzx"
# Get your run ID from the test output
# Look for: "Run ID: 20260109-104215-mdjtzx"
```
## Common Failure Patterns and Root Cause Analysis
### CRITICAL MINDSET: Code Issues vs Infrastructure Issues
@@ -250,10 +341,10 @@ require.NotNil(t, targetNode, "should find expected node")
- **Detection**: No progress in logs for >2 minutes during initialization
- **Solution**: `docker system prune -f` and retry
3. **Docker Port Conflicts**: Multiple tests trying to use same ports
- **Pattern**: "bind: address already in use" errors
- **Detection**: Port binding failures in Docker logs
- **Solution**: Only run ONE test at a time
3. **Docker Resource Exhaustion**: Too many concurrent tests overwhelming system
- **Pattern**: Container creation timeouts, OOM kills, slow test execution
- **Detection**: System load high, Docker daemon slow to respond
- **Solution**: Reduce number of concurrent tests, wait for completion before starting more
**CODE ISSUES (99% of failures)**:
1. **Route Approval Process Failures**: Routes not getting approved when they should be
@@ -273,12 +364,22 @@ require.NotNil(t, targetNode, "should find expected node")
### Critical Test Environment Setup
**Pre-Test Cleanup (MANDATORY)**:
**Pre-Test Cleanup**:
The test runner automatically handles cleanup:
- **Before test**: Removes only stale (stopped/exited) containers - does NOT affect running tests
- **After test**: Removes only containers belonging to the specific run ID
```bash
# ALWAYS run this before each test
# Only clean old log directories if disk space is low
rm -rf control_logs/202507*
docker system prune -f
df -h # Verify sufficient disk space
# SAFE: Clean only stale/stopped containers (does not affect running tests)
# The test runner does this automatically via cleanupStaleTestContainers()
# ⚠️ DANGEROUS: Only use when NO tests are running
docker system prune -f
```
**Environment Verification**:
@@ -286,8 +387,8 @@ df -h # Verify sufficient disk space
# Verify system readiness
go run ./cmd/hi doctor
# Check for running containers that might conflict
docker ps
# Check what tests are currently running (ALWAYS check before global cleanup)
docker ps --filter "name=headscale-test-suite" --format "{{.Names}}"
```
### Specific Test Categories and Known Issues
@@ -756,8 +857,14 @@ assert.EventuallyWithT(t, func(c *assert.CollectT) {
- **Why security focus**: Integration tests are the last line of defense against security regressions
- **EventuallyWithT Usage**: Proper use prevents race conditions without weakening security assertions
6. **Concurrent Execution Awareness**: Respect run ID isolation and never interfere with other agents' test sessions. Each test run has a unique run ID - only clean up YOUR containers (by run ID label), never perform global cleanup while tests may be running.
- **Why this matters**: Multiple agents/users may run tests concurrently on the same Docker daemon
- **Key Rule**: NEVER use global container cleanup commands - the test runner handles cleanup automatically per run ID
**CRITICAL PRINCIPLE**: Test expectations are sacred contracts that define correct system behavior. When tests fail, fix the code to match the test, never change the test to match broken code. Only timing and observability improvements are allowed - business logic expectations are immutable.
**ISOLATION PRINCIPLE**: Each test run is isolated by its unique Run ID. Never interfere with other test sessions. The system handles cleanup automatically - manual global cleanup commands are forbidden when other tests may be running.
**EventuallyWithT PRINCIPLE**: Every external call to headscale server or tailscale client must be wrapped in EventuallyWithT. Follow the five key rules strictly: one external call per block, proper variable scoping, no nesting, use CollectT for assertions, and provide descriptive messages.
**Remember**: Test failures are usually code issues in Headscale that need to be fixed, not infrastructure problems to be ignored. Use the specific debugging workflows and failure patterns documented above to efficiently identify root causes. Infrastructure issues have very specific signatures - everything else is code-related.

View File

@@ -165,6 +165,7 @@ jobs:
- TestPreAuthKeyCommandWithoutExpiry
- TestPreAuthKeyCommandReusableEphemeral
- TestPreAuthKeyCorrectUserLoggedInCommand
- TestTaggedNodesCLIOutput
- TestApiKeyCommand
- TestNodeCommand
- TestNodeExpireCommand

View File

@@ -405,13 +405,29 @@ go run ./cmd/hi run "TestName" --postgres
# Pattern matching for related tests
go run ./cmd/hi run "TestPattern*"
# Run multiple tests concurrently (each gets isolated run ID)
go run ./cmd/hi run "TestPingAllByIP" &
go run ./cmd/hi run "TestACLAllowUserDst" &
go run ./cmd/hi run "TestOIDCAuthenticationPingAll" &
```
**Concurrent Execution Support**:
The test runner supports running multiple tests concurrently on the same Docker daemon:
- Each test run gets a **unique Run ID** (format: `YYYYMMDD-HHMMSS-{6-char-hash}`)
- All containers are labeled with `hi.run-id` for isolation
- Container names include the run ID for easy identification (e.g., `ts-{runID}-1-74-{hash}`)
- Dynamic port allocation prevents port conflicts between concurrent runs
- Cleanup only affects containers belonging to the specific run ID
- Log directories are isolated per run: `control_logs/{runID}/`
**Critical Notes**:
- Only ONE test can run at a time (Docker port conflicts)
- Tests generate ~100MB of logs per run in `control_logs/`
- Clean environment before each test: `sudo rm -rf control_logs/202* && docker system prune -f`
- Running many tests concurrently may cause resource contention (CPU/memory)
- Clean stale containers periodically: `docker system prune -f`
### Test Artifacts Location

View File

@@ -18,9 +18,11 @@ import (
)
// cleanupBeforeTest performs cleanup operations before running tests.
// Only removes stale (stopped/exited) test containers to avoid interfering with concurrent test runs.
func cleanupBeforeTest(ctx context.Context) error {
if err := killTestContainers(ctx); err != nil {
return fmt.Errorf("failed to kill test containers: %w", err)
err := cleanupStaleTestContainers(ctx)
if err != nil {
return fmt.Errorf("failed to clean stale test containers: %w", err)
}
if err := pruneDockerNetworks(ctx); err != nil {
@@ -30,11 +32,25 @@ func cleanupBeforeTest(ctx context.Context) error {
return nil
}
// cleanupAfterTest removes the test container after completion.
func cleanupAfterTest(ctx context.Context, cli *client.Client, containerID string) error {
return cli.ContainerRemove(ctx, containerID, container.RemoveOptions{
// cleanupAfterTest removes the test container and all associated integration test containers for the run.
func cleanupAfterTest(ctx context.Context, cli *client.Client, containerID, runID string) error {
// Remove the main test container
err := cli.ContainerRemove(ctx, containerID, container.RemoveOptions{
Force: true,
})
if err != nil {
return fmt.Errorf("failed to remove test container: %w", err)
}
// Clean up integration test containers for this run only
if runID != "" {
err := killTestContainersByRunID(ctx, runID)
if err != nil {
return fmt.Errorf("failed to clean up containers for run %s: %w", runID, err)
}
}
return nil
}
// killTestContainers terminates and removes all test containers.
@@ -87,6 +103,100 @@ func killTestContainers(ctx context.Context) error {
return nil
}
// killTestContainersByRunID terminates and removes all test containers for a specific run ID.
// This function filters containers by the hi.run-id label to only affect containers
// belonging to the specified test run, leaving other concurrent test runs untouched.
func killTestContainersByRunID(ctx context.Context, runID string) error {
cli, err := createDockerClient()
if err != nil {
return fmt.Errorf("failed to create Docker client: %w", err)
}
defer cli.Close()
// Filter containers by hi.run-id label
containers, err := cli.ContainerList(ctx, container.ListOptions{
All: true,
Filters: filters.NewArgs(
filters.Arg("label", "hi.run-id="+runID),
),
})
if err != nil {
return fmt.Errorf("failed to list containers for run %s: %w", runID, err)
}
removed := 0
for _, cont := range containers {
// Kill the container if it's running
if cont.State == "running" {
_ = cli.ContainerKill(ctx, cont.ID, "KILL")
}
// Remove the container with retry logic
if removeContainerWithRetry(ctx, cli, cont.ID) {
removed++
}
}
if removed > 0 {
fmt.Printf("Removed %d containers for run ID %s\n", removed, runID)
}
return nil
}
// cleanupStaleTestContainers removes stopped/exited test containers without affecting running tests.
// This is useful for cleaning up leftover containers from previous crashed or interrupted test runs
// without interfering with currently running concurrent tests.
func cleanupStaleTestContainers(ctx context.Context) error {
cli, err := createDockerClient()
if err != nil {
return fmt.Errorf("failed to create Docker client: %w", err)
}
defer cli.Close()
// Only get stopped/exited containers
containers, err := cli.ContainerList(ctx, container.ListOptions{
All: true,
Filters: filters.NewArgs(
filters.Arg("status", "exited"),
filters.Arg("status", "dead"),
),
})
if err != nil {
return fmt.Errorf("failed to list stopped containers: %w", err)
}
removed := 0
for _, cont := range containers {
// Only remove containers that look like test containers
shouldRemove := false
for _, name := range cont.Names {
if strings.Contains(name, "headscale-test-suite") ||
strings.Contains(name, "hs-") ||
strings.Contains(name, "ts-") ||
strings.Contains(name, "derp-") {
shouldRemove = true
break
}
}
if shouldRemove {
if removeContainerWithRetry(ctx, cli, cont.ID) {
removed++
}
}
}
if removed > 0 {
fmt.Printf("Removed %d stale test containers\n", removed)
}
return nil
}
const (
containerRemoveInitialInterval = 100 * time.Millisecond
containerRemoveMaxElapsedTime = 2 * time.Second

View File

@@ -26,93 +26,8 @@ var (
ErrTestFailed = errors.New("test failed")
ErrUnexpectedContainerWait = errors.New("unexpected end of container wait")
ErrNoDockerContext = errors.New("no docker context found")
ErrAnotherRunInProgress = errors.New("another integration test run is already in progress")
)
// RunningTestInfo contains information about a currently running integration test.
type RunningTestInfo struct {
RunID string
ContainerID string
ContainerName string
StartTime time.Time
Duration time.Duration
TestPattern string
}
// ErrNoRunningTests indicates that no integration test is currently running.
var ErrNoRunningTests = errors.New("no running tests found")
// checkForRunningTests checks if there's already an integration test running.
// Returns ErrNoRunningTests if no test is running, or RunningTestInfo with details about the running test.
func checkForRunningTests(ctx context.Context) (*RunningTestInfo, error) {
cli, err := createDockerClient()
if err != nil {
return nil, fmt.Errorf("failed to create Docker client: %w", err)
}
defer cli.Close()
// List all running containers
containers, err := cli.ContainerList(ctx, container.ListOptions{
All: false, // Only running containers
})
if err != nil {
return nil, fmt.Errorf("failed to list containers: %w", err)
}
// Look for containers with hi.test-type=test-runner label
for _, cont := range containers {
if cont.Labels != nil && cont.Labels["hi.test-type"] == "test-runner" {
// Found a running test runner container
runID := cont.Labels["hi.run-id"]
containerName := ""
for _, name := range cont.Names {
containerName = strings.TrimPrefix(name, "/")
break
}
// Get more details via inspection
inspect, err := cli.ContainerInspect(ctx, cont.ID)
if err != nil {
// Return basic info if inspection fails
return &RunningTestInfo{
RunID: runID,
ContainerID: cont.ID,
ContainerName: containerName,
}, nil
}
startTime, _ := time.Parse(time.RFC3339Nano, inspect.State.StartedAt)
duration := time.Since(startTime)
// Try to extract test pattern from command
testPattern := ""
if len(inspect.Config.Cmd) > 0 {
for i, arg := range inspect.Config.Cmd {
if arg == "-run" && i+1 < len(inspect.Config.Cmd) {
testPattern = inspect.Config.Cmd[i+1]
break
}
}
}
return &RunningTestInfo{
RunID: runID,
ContainerID: cont.ID,
ContainerName: containerName,
StartTime: startTime,
Duration: duration,
TestPattern: testPattern,
}, nil
}
}
return nil, ErrNoRunningTests
}
// runTestContainer executes integration tests in a Docker container.
func runTestContainer(ctx context.Context, config *RunConfig) error {
cli, err := createDockerClient()
@@ -174,6 +89,9 @@ func runTestContainer(ctx context.Context, config *RunConfig) error {
}
log.Printf("Starting test: %s", config.TestPattern)
log.Printf("Run ID: %s", runID)
log.Printf("Monitor with: docker logs -f %s", containerName)
log.Printf("Logs directory: %s", logsDir)
// Start stats collection for container resource monitoring (if enabled)
var statsCollector *StatsCollector
@@ -234,9 +152,12 @@ func runTestContainer(ctx context.Context, config *RunConfig) error {
shouldCleanup := config.CleanAfter && (!config.KeepOnFailure || exitCode == 0)
if shouldCleanup {
if config.Verbose {
log.Printf("Running post-test cleanup...")
log.Printf("Running post-test cleanup for run %s...", runID)
}
if cleanErr := cleanupAfterTest(ctx, cli, resp.ID); cleanErr != nil && config.Verbose {
cleanErr := cleanupAfterTest(ctx, cli, resp.ID, runID)
if cleanErr != nil && config.Verbose {
log.Printf("Warning: post-test cleanup failed: %v", cleanErr)
}

View File

@@ -6,7 +6,6 @@ import (
"log"
"os"
"path/filepath"
"strings"
"time"
"github.com/creachadair/command"
@@ -14,65 +13,13 @@ import (
var ErrTestPatternRequired = errors.New("test pattern is required as first argument or use --test flag")
// formatRunningTestError creates a detailed error message about a running test.
func formatRunningTestError(info *RunningTestInfo) error {
var msg strings.Builder
msg.WriteString("\n")
msg.WriteString("╔══════════════════════════════════════════════════════════════════╗\n")
msg.WriteString("║ Another integration test run is already in progress! ║\n")
msg.WriteString("╚══════════════════════════════════════════════════════════════════╝\n")
msg.WriteString("\n")
msg.WriteString("Running test details:\n")
msg.WriteString(fmt.Sprintf(" Run ID: %s\n", info.RunID))
msg.WriteString(fmt.Sprintf(" Container: %s\n", info.ContainerName))
if info.TestPattern != "" {
msg.WriteString(fmt.Sprintf(" Test: %s\n", info.TestPattern))
}
if !info.StartTime.IsZero() {
msg.WriteString(fmt.Sprintf(" Started: %s\n", info.StartTime.Format("2006-01-02 15:04:05")))
msg.WriteString(fmt.Sprintf(" Running for: %s\n", formatDuration(info.Duration)))
}
msg.WriteString("\n")
msg.WriteString("Please wait for the current test to complete, or stop it with:\n")
msg.WriteString(" go run ./cmd/hi clean containers\n")
msg.WriteString("\n")
msg.WriteString("To monitor the running test:\n")
msg.WriteString(fmt.Sprintf(" docker logs -f %s\n", info.ContainerName))
return fmt.Errorf("%w\n%s", ErrAnotherRunInProgress, msg.String())
}
const secondsPerMinute = 60
// formatDuration formats a duration in a human-readable way.
func formatDuration(d time.Duration) string {
if d < time.Minute {
return fmt.Sprintf("%d seconds", int(d.Seconds()))
}
if d < time.Hour {
minutes := int(d.Minutes())
seconds := int(d.Seconds()) % secondsPerMinute
return fmt.Sprintf("%d minutes, %d seconds", minutes, seconds)
}
hours := int(d.Hours())
minutes := int(d.Minutes()) % secondsPerMinute
return fmt.Sprintf("%d hours, %d minutes", hours, minutes)
}
type RunConfig struct {
TestPattern string `flag:"test,Test pattern to run"`
Timeout time.Duration `flag:"timeout,default=120m,Test timeout"`
FailFast bool `flag:"failfast,default=true,Stop on first test failure"`
UsePostgres bool `flag:"postgres,default=false,Use PostgreSQL instead of SQLite"`
GoVersion string `flag:"go-version,Go version to use (auto-detected from go.mod)"`
CleanBefore bool `flag:"clean-before,default=true,Clean resources before test"`
CleanBefore bool `flag:"clean-before,default=true,Clean stale resources before test"`
CleanAfter bool `flag:"clean-after,default=true,Clean resources after test"`
KeepOnFailure bool `flag:"keep-on-failure,default=false,Keep containers on test failure"`
LogsDir string `flag:"logs-dir,default=control_logs,Control logs directory"`
@@ -80,7 +27,6 @@ type RunConfig struct {
Stats bool `flag:"stats,default=false,Collect and display container resource usage statistics"`
HSMemoryLimit float64 `flag:"hs-memory-limit,default=0,Fail test if any Headscale container exceeds this memory limit in MB (0 = disabled)"`
TSMemoryLimit float64 `flag:"ts-memory-limit,default=0,Fail test if any Tailscale container exceeds this memory limit in MB (0 = disabled)"`
Force bool `flag:"force,default=false,Kill any running test and start a new one"`
}
// runIntegrationTest executes the integration test workflow.
@@ -98,23 +44,6 @@ func runIntegrationTest(env *command.Env) error {
runConfig.GoVersion = detectGoVersion()
}
// Check if another test run is already in progress
runningTest, err := checkForRunningTests(env.Context())
if err != nil && !errors.Is(err, ErrNoRunningTests) {
log.Printf("Warning: failed to check for running tests: %v", err)
} else if runningTest != nil {
if runConfig.Force {
log.Printf("Force flag set, killing existing test run: %s", runningTest.RunID)
err = killTestContainers(env.Context())
if err != nil {
return fmt.Errorf("failed to kill existing test containers: %w", err)
}
} else {
return formatRunningTestError(runningTest)
}
}
// Run pre-flight checks
if runConfig.Verbose {
log.Printf("Running pre-flight system checks...")

View File

@@ -7,6 +7,7 @@
This page collects third-party tools, client libraries, and scripts related to headscale.
- [headscale-operator](https://github.com/infradohq/headscale-operator) - Headscale Kubernetes Operator
- [tailscale-manager](https://github.com/singlestore-labs/tailscale-manager) - Dynamically manage Tailscale route
advertisements
- [headscalebacktosqlite](https://github.com/bigbozza/headscalebacktosqlite) - Migrate headscale from PostgreSQL back to

6
flake.lock generated
View File

@@ -20,11 +20,11 @@
},
"nixpkgs": {
"locked": {
"lastModified": 1760533177,
"narHash": "sha256-OwM1sFustLHx+xmTymhucZuNhtq98fHIbfO8Swm5L8A=",
"lastModified": 1766840161,
"narHash": "sha256-Ss/LHpJJsng8vz1Pe33RSGIWUOcqM1fjrehjUkdrWio=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "35f590344ff791e6b1d6d6b8f3523467c9217caf",
"rev": "3edc4a30ed3903fdf6f90c837f961fa6b49582d1",
"type": "github"
},
"original": {

View File

@@ -229,7 +229,7 @@
apps.default = apps.headscale;
checks = {
headscale = pkgs.nixosTest (import ./nix/tests/headscale.nix);
headscale = pkgs.testers.nixosTest (import ./nix/tests/headscale.nix);
};
});
}

View File

@@ -1,6 +1,6 @@
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
// protoc-gen-go v1.36.10
// protoc-gen-go v1.36.11
// protoc (unknown)
// source: headscale/v1/apikey.proto

View File

@@ -1,6 +1,6 @@
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
// protoc-gen-go v1.36.10
// protoc-gen-go v1.36.11
// protoc (unknown)
// source: headscale/v1/device.proto

View File

@@ -1,6 +1,6 @@
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
// protoc-gen-go v1.36.10
// protoc-gen-go v1.36.11
// protoc (unknown)
// source: headscale/v1/headscale.proto

View File

@@ -1,6 +1,6 @@
// Code generated by protoc-gen-go-grpc. DO NOT EDIT.
// versions:
// - protoc-gen-go-grpc v1.5.1
// - protoc-gen-go-grpc v1.6.0
// - protoc (unknown)
// source: headscale/v1/headscale.proto
@@ -387,79 +387,79 @@ type HeadscaleServiceServer interface {
type UnimplementedHeadscaleServiceServer struct{}
func (UnimplementedHeadscaleServiceServer) CreateUser(context.Context, *CreateUserRequest) (*CreateUserResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method CreateUser not implemented")
return nil, status.Error(codes.Unimplemented, "method CreateUser not implemented")
}
func (UnimplementedHeadscaleServiceServer) RenameUser(context.Context, *RenameUserRequest) (*RenameUserResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method RenameUser not implemented")
return nil, status.Error(codes.Unimplemented, "method RenameUser not implemented")
}
func (UnimplementedHeadscaleServiceServer) DeleteUser(context.Context, *DeleteUserRequest) (*DeleteUserResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method DeleteUser not implemented")
return nil, status.Error(codes.Unimplemented, "method DeleteUser not implemented")
}
func (UnimplementedHeadscaleServiceServer) ListUsers(context.Context, *ListUsersRequest) (*ListUsersResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method ListUsers not implemented")
return nil, status.Error(codes.Unimplemented, "method ListUsers not implemented")
}
func (UnimplementedHeadscaleServiceServer) CreatePreAuthKey(context.Context, *CreatePreAuthKeyRequest) (*CreatePreAuthKeyResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method CreatePreAuthKey not implemented")
return nil, status.Error(codes.Unimplemented, "method CreatePreAuthKey not implemented")
}
func (UnimplementedHeadscaleServiceServer) ExpirePreAuthKey(context.Context, *ExpirePreAuthKeyRequest) (*ExpirePreAuthKeyResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method ExpirePreAuthKey not implemented")
return nil, status.Error(codes.Unimplemented, "method ExpirePreAuthKey not implemented")
}
func (UnimplementedHeadscaleServiceServer) DeletePreAuthKey(context.Context, *DeletePreAuthKeyRequest) (*DeletePreAuthKeyResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method DeletePreAuthKey not implemented")
return nil, status.Error(codes.Unimplemented, "method DeletePreAuthKey not implemented")
}
func (UnimplementedHeadscaleServiceServer) ListPreAuthKeys(context.Context, *ListPreAuthKeysRequest) (*ListPreAuthKeysResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method ListPreAuthKeys not implemented")
return nil, status.Error(codes.Unimplemented, "method ListPreAuthKeys not implemented")
}
func (UnimplementedHeadscaleServiceServer) DebugCreateNode(context.Context, *DebugCreateNodeRequest) (*DebugCreateNodeResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method DebugCreateNode not implemented")
return nil, status.Error(codes.Unimplemented, "method DebugCreateNode not implemented")
}
func (UnimplementedHeadscaleServiceServer) GetNode(context.Context, *GetNodeRequest) (*GetNodeResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method GetNode not implemented")
return nil, status.Error(codes.Unimplemented, "method GetNode not implemented")
}
func (UnimplementedHeadscaleServiceServer) SetTags(context.Context, *SetTagsRequest) (*SetTagsResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method SetTags not implemented")
return nil, status.Error(codes.Unimplemented, "method SetTags not implemented")
}
func (UnimplementedHeadscaleServiceServer) SetApprovedRoutes(context.Context, *SetApprovedRoutesRequest) (*SetApprovedRoutesResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method SetApprovedRoutes not implemented")
return nil, status.Error(codes.Unimplemented, "method SetApprovedRoutes not implemented")
}
func (UnimplementedHeadscaleServiceServer) RegisterNode(context.Context, *RegisterNodeRequest) (*RegisterNodeResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method RegisterNode not implemented")
return nil, status.Error(codes.Unimplemented, "method RegisterNode not implemented")
}
func (UnimplementedHeadscaleServiceServer) DeleteNode(context.Context, *DeleteNodeRequest) (*DeleteNodeResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method DeleteNode not implemented")
return nil, status.Error(codes.Unimplemented, "method DeleteNode not implemented")
}
func (UnimplementedHeadscaleServiceServer) ExpireNode(context.Context, *ExpireNodeRequest) (*ExpireNodeResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method ExpireNode not implemented")
return nil, status.Error(codes.Unimplemented, "method ExpireNode not implemented")
}
func (UnimplementedHeadscaleServiceServer) RenameNode(context.Context, *RenameNodeRequest) (*RenameNodeResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method RenameNode not implemented")
return nil, status.Error(codes.Unimplemented, "method RenameNode not implemented")
}
func (UnimplementedHeadscaleServiceServer) ListNodes(context.Context, *ListNodesRequest) (*ListNodesResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method ListNodes not implemented")
return nil, status.Error(codes.Unimplemented, "method ListNodes not implemented")
}
func (UnimplementedHeadscaleServiceServer) BackfillNodeIPs(context.Context, *BackfillNodeIPsRequest) (*BackfillNodeIPsResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method BackfillNodeIPs not implemented")
return nil, status.Error(codes.Unimplemented, "method BackfillNodeIPs not implemented")
}
func (UnimplementedHeadscaleServiceServer) CreateApiKey(context.Context, *CreateApiKeyRequest) (*CreateApiKeyResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method CreateApiKey not implemented")
return nil, status.Error(codes.Unimplemented, "method CreateApiKey not implemented")
}
func (UnimplementedHeadscaleServiceServer) ExpireApiKey(context.Context, *ExpireApiKeyRequest) (*ExpireApiKeyResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method ExpireApiKey not implemented")
return nil, status.Error(codes.Unimplemented, "method ExpireApiKey not implemented")
}
func (UnimplementedHeadscaleServiceServer) ListApiKeys(context.Context, *ListApiKeysRequest) (*ListApiKeysResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method ListApiKeys not implemented")
return nil, status.Error(codes.Unimplemented, "method ListApiKeys not implemented")
}
func (UnimplementedHeadscaleServiceServer) DeleteApiKey(context.Context, *DeleteApiKeyRequest) (*DeleteApiKeyResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method DeleteApiKey not implemented")
return nil, status.Error(codes.Unimplemented, "method DeleteApiKey not implemented")
}
func (UnimplementedHeadscaleServiceServer) GetPolicy(context.Context, *GetPolicyRequest) (*GetPolicyResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method GetPolicy not implemented")
return nil, status.Error(codes.Unimplemented, "method GetPolicy not implemented")
}
func (UnimplementedHeadscaleServiceServer) SetPolicy(context.Context, *SetPolicyRequest) (*SetPolicyResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method SetPolicy not implemented")
return nil, status.Error(codes.Unimplemented, "method SetPolicy not implemented")
}
func (UnimplementedHeadscaleServiceServer) Health(context.Context, *HealthRequest) (*HealthResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method Health not implemented")
return nil, status.Error(codes.Unimplemented, "method Health not implemented")
}
func (UnimplementedHeadscaleServiceServer) mustEmbedUnimplementedHeadscaleServiceServer() {}
func (UnimplementedHeadscaleServiceServer) testEmbeddedByValue() {}
@@ -472,7 +472,7 @@ type UnsafeHeadscaleServiceServer interface {
}
func RegisterHeadscaleServiceServer(s grpc.ServiceRegistrar, srv HeadscaleServiceServer) {
// If the following call pancis, it indicates UnimplementedHeadscaleServiceServer was
// If the following call panics, it indicates UnimplementedHeadscaleServiceServer was
// embedded by pointer and is nil. This will cause panics if an
// unimplemented method is ever invoked, so we test this at initialization
// time to prevent it from happening at runtime later due to I/O.

View File

@@ -1,6 +1,6 @@
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
// protoc-gen-go v1.36.10
// protoc-gen-go v1.36.11
// protoc (unknown)
// source: headscale/v1/node.proto

View File

@@ -1,6 +1,6 @@
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
// protoc-gen-go v1.36.10
// protoc-gen-go v1.36.11
// protoc (unknown)
// source: headscale/v1/policy.proto

View File

@@ -1,6 +1,6 @@
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
// protoc-gen-go v1.36.10
// protoc-gen-go v1.36.11
// protoc (unknown)
// source: headscale/v1/preauthkey.proto

View File

@@ -1,6 +1,6 @@
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
// protoc-gen-go v1.36.10
// protoc-gen-go v1.36.11
// protoc (unknown)
// source: headscale/v1/user.proto

View File

@@ -247,9 +247,9 @@ func nodeToRegisterResponse(node types.NodeView) *tailcfg.RegisterResponse {
if node.IsTagged() {
resp.User = types.TaggedDevices.View().TailscaleUser()
resp.Login = types.TaggedDevices.View().TailscaleLogin()
} else if node.UserView().Valid() {
resp.User = node.UserView().TailscaleUser()
resp.Login = node.UserView().TailscaleLogin()
} else if node.Owner().Valid() {
resp.User = node.Owner().TailscaleUser()
resp.Login = node.Owner().TailscaleLogin()
}
return resp
@@ -389,8 +389,8 @@ func (h *Headscale) handleRegisterWithAuthKey(
resp := &tailcfg.RegisterResponse{
MachineAuthorized: true,
NodeKeyExpired: node.IsExpired(),
User: node.UserView().TailscaleUser(),
Login: node.UserView().TailscaleLogin(),
User: node.Owner().TailscaleUser(),
Login: node.Owner().TailscaleLogin(),
}
log.Trace().

View File

@@ -16,6 +16,7 @@ var (
ErrInvalidAutoVacuum = errors.New("invalid auto_vacuum")
ErrWALAutocheckpoint = errors.New("wal_autocheckpoint must be >= -1")
ErrInvalidSynchronous = errors.New("invalid synchronous")
ErrInvalidTxLock = errors.New("invalid txlock")
)
const (
@@ -225,6 +226,62 @@ func (s Synchronous) String() string {
return string(s)
}
// TxLock represents SQLite transaction lock mode.
// Transaction lock mode determines when write locks are acquired during transactions.
//
// Lock Acquisition Behavior:
//
// DEFERRED - SQLite default, acquire lock lazily:
// - Transaction starts without any lock
// - First read acquires SHARED lock
// - First write attempts to upgrade to RESERVED lock
// - If another transaction holds RESERVED: SQLITE_BUSY (potential deadlock)
// - Can cause deadlocks when multiple connections attempt concurrent writes
//
// IMMEDIATE - Recommended for write-heavy workloads:
// - Transaction immediately acquires RESERVED lock at BEGIN
// - If lock unavailable, waits up to busy_timeout before failing
// - Other writers queue orderly instead of deadlocking
// - Prevents the upgrade-lock deadlock scenario
// - Slight overhead for read-only transactions that don't need locks
//
// EXCLUSIVE - Maximum isolation:
// - Transaction immediately acquires EXCLUSIVE lock at BEGIN
// - No other connections can read or write
// - Highest isolation but lowest concurrency
// - Rarely needed in practice
type TxLock string
const (
// TxLockDeferred acquires locks lazily (SQLite default).
// Risk of SQLITE_BUSY deadlocks with concurrent writers. Use for read-heavy workloads.
TxLockDeferred TxLock = "deferred"
// TxLockImmediate acquires write lock immediately (RECOMMENDED for production).
// Prevents deadlocks by acquiring RESERVED lock at transaction start.
// Writers queue orderly, respecting busy_timeout.
TxLockImmediate TxLock = "immediate"
// TxLockExclusive acquires exclusive lock immediately.
// Maximum isolation, no concurrent reads or writes. Rarely needed.
TxLockExclusive TxLock = "exclusive"
)
// IsValid returns true if the TxLock is valid.
func (t TxLock) IsValid() bool {
switch t {
case TxLockDeferred, TxLockImmediate, TxLockExclusive, "":
return true
default:
return false
}
}
// String returns the string representation.
func (t TxLock) String() string {
return string(t)
}
// Config holds SQLite database configuration with type-safe enums.
// This configuration balances performance, durability, and operational requirements
// for Headscale's SQLite database usage patterns.
@@ -236,6 +293,7 @@ type Config struct {
WALAutocheckpoint int // pages (-1 = default/not set, 0 = disabled, >0 = enabled)
Synchronous Synchronous // synchronous mode (affects durability vs performance)
ForeignKeys bool // enable foreign key constraints (data integrity)
TxLock TxLock // transaction lock mode (affects write concurrency)
}
// Default returns the production configuration optimized for Headscale's usage patterns.
@@ -244,6 +302,7 @@ type Config struct {
// - Data durability with good performance (NORMAL synchronous)
// - Automatic space management (INCREMENTAL auto-vacuum)
// - Data integrity (foreign key constraints enabled)
// - Safe concurrent writes (IMMEDIATE transaction lock)
// - Reasonable timeout for busy database scenarios (10s)
func Default(path string) *Config {
return &Config{
@@ -254,6 +313,7 @@ func Default(path string) *Config {
WALAutocheckpoint: 1000,
Synchronous: SynchronousNormal,
ForeignKeys: true,
TxLock: TxLockImmediate,
}
}
@@ -292,6 +352,10 @@ func (c *Config) Validate() error {
return fmt.Errorf("%w: %s", ErrInvalidSynchronous, c.Synchronous)
}
if c.TxLock != "" && !c.TxLock.IsValid() {
return fmt.Errorf("%w: %s", ErrInvalidTxLock, c.TxLock)
}
return nil
}
@@ -332,12 +396,20 @@ func (c *Config) ToURL() (string, error) {
baseURL = "file:" + c.Path
}
// Add parameters without encoding = signs
if len(pragmas) > 0 {
var queryParts []string
for _, pragma := range pragmas {
queryParts = append(queryParts, "_pragma="+pragma)
}
// Build query parameters
queryParts := make([]string, 0, 1+len(pragmas))
// Add _txlock first (it's a connection parameter, not a pragma)
if c.TxLock != "" {
queryParts = append(queryParts, "_txlock="+string(c.TxLock))
}
// Add pragma parameters
for _, pragma := range pragmas {
queryParts = append(queryParts, "_pragma="+pragma)
}
if len(queryParts) > 0 {
baseURL += "?" + strings.Join(queryParts, "&")
}

View File

@@ -71,6 +71,52 @@ func TestSynchronous(t *testing.T) {
}
}
func TestTxLock(t *testing.T) {
tests := []struct {
mode TxLock
valid bool
}{
{TxLockDeferred, true},
{TxLockImmediate, true},
{TxLockExclusive, true},
{TxLock(""), true}, // empty is valid (uses driver default)
{TxLock("IMMEDIATE"), false}, // uppercase is invalid
{TxLock("INVALID"), false},
}
for _, tt := range tests {
name := string(tt.mode)
if name == "" {
name = "empty"
}
t.Run(name, func(t *testing.T) {
if got := tt.mode.IsValid(); got != tt.valid {
t.Errorf("TxLock(%q).IsValid() = %v, want %v", tt.mode, got, tt.valid)
}
})
}
}
func TestTxLockString(t *testing.T) {
tests := []struct {
mode TxLock
want string
}{
{TxLockDeferred, "deferred"},
{TxLockImmediate, "immediate"},
{TxLockExclusive, "exclusive"},
}
for _, tt := range tests {
t.Run(tt.want, func(t *testing.T) {
if got := tt.mode.String(); got != tt.want {
t.Errorf("TxLock.String() = %q, want %q", got, tt.want)
}
})
}
}
func TestConfigValidate(t *testing.T) {
tests := []struct {
name string
@@ -104,6 +150,21 @@ func TestConfigValidate(t *testing.T) {
},
wantErr: true,
},
{
name: "invalid txlock",
config: &Config{
Path: "/path/to/db.sqlite",
TxLock: TxLock("INVALID"),
},
wantErr: true,
},
{
name: "valid txlock immediate",
config: &Config{
Path: "/path/to/db.sqlite",
TxLock: TxLockImmediate,
},
},
}
for _, tt := range tests {
@@ -123,9 +184,9 @@ func TestConfigToURL(t *testing.T) {
want string
}{
{
name: "default config",
name: "default config includes txlock immediate",
config: Default("/path/to/db.sqlite"),
want: "file:/path/to/db.sqlite?_pragma=busy_timeout=10000&_pragma=journal_mode=WAL&_pragma=auto_vacuum=INCREMENTAL&_pragma=wal_autocheckpoint=1000&_pragma=synchronous=NORMAL&_pragma=foreign_keys=ON",
want: "file:/path/to/db.sqlite?_txlock=immediate&_pragma=busy_timeout=10000&_pragma=journal_mode=WAL&_pragma=auto_vacuum=INCREMENTAL&_pragma=wal_autocheckpoint=1000&_pragma=synchronous=NORMAL&_pragma=foreign_keys=ON",
},
{
name: "memory config",
@@ -183,6 +244,47 @@ func TestConfigToURL(t *testing.T) {
},
want: "file:/full.db?_pragma=busy_timeout=15000&_pragma=journal_mode=WAL&_pragma=auto_vacuum=FULL&_pragma=wal_autocheckpoint=1000&_pragma=synchronous=EXTRA&_pragma=foreign_keys=ON",
},
{
name: "with txlock immediate",
config: &Config{
Path: "/test.db",
BusyTimeout: 5000,
TxLock: TxLockImmediate,
WALAutocheckpoint: -1,
ForeignKeys: true,
},
want: "file:/test.db?_txlock=immediate&_pragma=busy_timeout=5000&_pragma=foreign_keys=ON",
},
{
name: "with txlock deferred",
config: &Config{
Path: "/test.db",
TxLock: TxLockDeferred,
WALAutocheckpoint: -1,
ForeignKeys: true,
},
want: "file:/test.db?_txlock=deferred&_pragma=foreign_keys=ON",
},
{
name: "with txlock exclusive",
config: &Config{
Path: "/test.db",
TxLock: TxLockExclusive,
WALAutocheckpoint: -1,
},
want: "file:/test.db?_txlock=exclusive",
},
{
name: "empty txlock omitted from URL",
config: &Config{
Path: "/test.db",
TxLock: "",
BusyTimeout: 1000,
WALAutocheckpoint: -1,
ForeignKeys: true,
},
want: "file:/test.db?_pragma=busy_timeout=1000&_pragma=foreign_keys=ON",
},
}
for _, tt := range tests {
@@ -209,3 +311,10 @@ func TestConfigToURLInvalid(t *testing.T) {
t.Error("Config.ToURL() with invalid config should return error")
}
}
func TestDefaultConfigHasTxLockImmediate(t *testing.T) {
config := Default("/test.db")
if config.TxLock != TxLockImmediate {
t.Errorf("Default().TxLock = %q, want %q", config.TxLock, TxLockImmediate)
}
}

View File

@@ -69,18 +69,19 @@ func newMapper(
}
}
// generateUserProfiles creates user profiles for MapResponse.
func generateUserProfiles(
node types.NodeView,
peers views.Slice[types.NodeView],
) []tailcfg.UserProfile {
userMap := make(map[uint]*types.UserView)
ids := make([]uint, 0, len(userMap))
user := node.User()
user := node.Owner()
userID := user.Model().ID
userMap[userID] = &user
ids = append(ids, userID)
for _, peer := range peers.All() {
peerUser := peer.User()
peerUser := peer.Owner()
peerUserID := peerUser.Model().ID
userMap[peerUserID] = &peerUser
ids = append(ids, peerUserID)

View File

@@ -78,7 +78,7 @@ func (s *State) DebugOverview() string {
now := time.Now()
for _, node := range allNodes.All() {
if node.Valid() {
userName := node.User().Name()
userName := node.Owner().Name()
userNodeCounts[userName]++
if node.IsOnline().Valid() && node.IsOnline().Get() {
@@ -281,7 +281,7 @@ func (s *State) DebugOverviewJSON() DebugOverviewInfo {
for _, node := range allNodes.All() {
if node.Valid() {
userName := node.User().Name()
userName := node.Owner().Name()
info.Users[userName]++
if node.IsOnline().Valid() && node.IsOnline().Get() {

View File

@@ -509,15 +509,27 @@ func (s *NodeStore) DebugString() string {
sb.WriteString(fmt.Sprintf("Users with Nodes: %d\n", len(snapshot.nodesByUser)))
sb.WriteString("\n")
// User distribution
sb.WriteString("Nodes by User:\n")
// User distribution (shows internal UserID tracking, not display owner)
sb.WriteString("Nodes by Internal User ID:\n")
for userID, nodes := range snapshot.nodesByUser {
if len(nodes) > 0 {
userName := "unknown"
taggedCount := 0
if len(nodes) > 0 && nodes[0].Valid() {
userName = nodes[0].User().Name()
// Count tagged nodes (which have UserID set but are owned by "tagged-devices")
for _, n := range nodes {
if n.IsTagged() {
taggedCount++
}
}
}
if taggedCount > 0 {
sb.WriteString(fmt.Sprintf(" - User %d (%s): %d nodes (%d tagged)\n", userID, userName, len(nodes), taggedCount))
} else {
sb.WriteString(fmt.Sprintf(" - User %d (%s): %d nodes\n", userID, userName, len(nodes)))
}
sb.WriteString(fmt.Sprintf(" - User %d (%s): %d nodes\n", userID, userName, len(nodes)))
}
}
sb.WriteString("\n")

View File

@@ -719,7 +719,13 @@ func (node Node) DebugString() string {
return sb.String()
}
func (nv NodeView) UserView() UserView {
// Owner returns the owner for display purposes.
// For tagged nodes, returns TaggedDevices. For user-owned nodes, returns the user.
func (nv NodeView) Owner() UserView {
if nv.IsTagged() {
return TaggedDevices.View()
}
return nv.User()
}

View File

@@ -659,6 +659,123 @@ func TestPreAuthKeyCorrectUserLoggedInCommand(t *testing.T) {
}, 20*time.Second, 1*time.Second)
}
func TestTaggedNodesCLIOutput(t *testing.T) {
IntegrationSkip(t)
user1 := "user1"
user2 := "user2"
spec := ScenarioSpec{
NodesPerUser: 1,
Users: []string{user1},
}
scenario, err := NewScenario(spec)
require.NoError(t, err)
defer scenario.ShutdownAssertNoPanics(t)
err = scenario.CreateHeadscaleEnv(
[]tsic.Option{},
hsic.WithTestName("tagcli"),
hsic.WithEmbeddedDERPServerOnly(),
hsic.WithTLS(),
)
require.NoError(t, err)
headscale, err := scenario.Headscale()
require.NoError(t, err)
u2, err := headscale.CreateUser(user2)
require.NoError(t, err)
var user2Key v1.PreAuthKey
// Create a tagged PreAuthKey for user2
assert.EventuallyWithT(t, func(c *assert.CollectT) {
err = executeAndUnmarshal(
headscale,
[]string{
"headscale",
"preauthkeys",
"--user",
strconv.FormatUint(u2.GetId(), 10),
"create",
"--reusable",
"--expiration",
"24h",
"--output",
"json",
"--tags",
"tag:test1,tag:test2",
},
&user2Key,
)
assert.NoError(c, err)
}, 10*time.Second, 200*time.Millisecond, "Waiting for user2 tagged preauth key creation")
allClients, err := scenario.ListTailscaleClients()
requireNoErrListClients(t, err)
require.Len(t, allClients, 1)
client := allClients[0]
// Log out from user1
err = client.Logout()
require.NoError(t, err)
err = scenario.WaitForTailscaleLogout()
require.NoError(t, err)
assert.EventuallyWithT(t, func(ct *assert.CollectT) {
status, err := client.Status()
assert.NoError(ct, err)
assert.NotContains(ct, []string{"Starting", "Running"}, status.BackendState,
"Expected node to be logged out, backend state: %s", status.BackendState)
}, 30*time.Second, 2*time.Second)
// Log in with the tagged PreAuthKey (from user2, with tags)
err = client.Login(headscale.GetEndpoint(), user2Key.GetKey())
require.NoError(t, err)
assert.EventuallyWithT(t, func(ct *assert.CollectT) {
status, err := client.Status()
assert.NoError(ct, err)
assert.Equal(ct, "Running", status.BackendState, "Expected node to be logged in, backend state: %s", status.BackendState)
// With tags-as-identity model, tagged nodes show as TaggedDevices user (2147455555)
assert.Equal(ct, "userid:2147455555", status.Self.UserID.String(), "Expected node to be logged in as tagged-devices user")
}, 30*time.Second, 2*time.Second)
// Wait for the second node to appear
var listNodes []*v1.Node
assert.EventuallyWithT(t, func(ct *assert.CollectT) {
var err error
listNodes, err = headscale.ListNodes()
assert.NoError(ct, err)
assert.Len(ct, listNodes, 2, "Should have 2 nodes after re-login with tagged key")
assert.Equal(ct, user1, listNodes[0].GetUser().GetName(), "First node should belong to user1")
assert.Equal(ct, "tagged-devices", listNodes[1].GetUser().GetName(), "Second node should be tagged-devices")
}, 20*time.Second, 1*time.Second)
// Test: tailscale status output should show "tagged-devices" not "userid:2147455555"
// This is the fix for issue #2970 - the Tailscale client should display user-friendly names
assert.EventuallyWithT(t, func(ct *assert.CollectT) {
stdout, stderr, err := client.Execute([]string{"tailscale", "status"})
assert.NoError(ct, err, "tailscale status command should succeed, stderr: %s", stderr)
t.Logf("Tailscale status output:\n%s", stdout)
// The output should contain "tagged-devices" for tagged nodes
assert.Contains(ct, stdout, "tagged-devices", "Tailscale status should show 'tagged-devices' for tagged nodes")
// The output should NOT show the raw numeric userid to the user
assert.NotContains(ct, stdout, "userid:2147455555", "Tailscale status should not show numeric userid for tagged nodes")
}, 20*time.Second, 1*time.Second)
}
func TestApiKeyCommand(t *testing.T) {
IntegrationSkip(t)

View File

@@ -147,7 +147,18 @@ func New(
return nil, err
}
hostname := fmt.Sprintf("derp-%s-%s", strings.ReplaceAll(version, ".", "-"), hash)
// Include run ID in hostname for easier identification of which test run owns this container
runID := dockertestutil.GetIntegrationRunID()
var hostname string
if runID != "" {
// Use last 6 chars of run ID (the random hash part) for brevity
runIDShort := runID[len(runID)-6:]
hostname = fmt.Sprintf("derp-%s-%s-%s", runIDShort, strings.ReplaceAll(version, ".", "-"), hash)
} else {
hostname = fmt.Sprintf("derp-%s-%s", strings.ReplaceAll(version, ".", "-"), hash)
}
tlsCert, tlsKey, err := integrationutil.CreateCertificate(hostname)
if err != nil {
return nil, fmt.Errorf("failed to create certificates for headscale test: %w", err)

View File

@@ -74,6 +74,7 @@ type HeadscaleInContainer struct {
// optional config
port int
extraPorts []string
hostMetricsPort string // Dynamically assigned host port for metrics/pprof access
caCerts [][]byte
hostPortBindings map[string][]string
aclPolicy *policyv2.Policy
@@ -330,7 +331,18 @@ func New(
return nil, err
}
hostname := "hs-" + hash
// Include run ID in hostname for easier identification of which test run owns this container
runID := dockertestutil.GetIntegrationRunID()
var hostname string
if runID != "" {
// Use last 6 chars of run ID (the random hash part) for brevity
runIDShort := runID[len(runID)-6:]
hostname = fmt.Sprintf("hs-%s-%s", runIDShort, hash)
} else {
hostname = "hs-" + hash
}
hsic := &HeadscaleInContainer{
hostname: hostname,
@@ -438,13 +450,13 @@ func New(
Env: env,
}
// Bind metrics port to predictable host port
// Bind metrics port to dynamic host port (kernel assigns free port)
if runOptions.PortBindings == nil {
runOptions.PortBindings = map[docker.Port][]docker.PortBinding{}
}
runOptions.PortBindings["9090/tcp"] = []docker.PortBinding{
{HostPort: "49090"},
{HostPort: "0"}, // Let kernel assign a free port
}
if len(hsic.hostPortBindings) > 0 {
@@ -540,9 +552,14 @@ func New(
hsic.container = container
// Get the dynamically assigned host port for metrics/pprof
hsic.hostMetricsPort = container.GetHostPort("9090/tcp")
log.Printf(
"Ports for %s: metrics/pprof=49090\n",
"Headscale %s metrics available at http://localhost:%s/metrics (debug at http://localhost:%s/debug/)\n",
hsic.hostname,
hsic.hostMetricsPort,
hsic.hostMetricsPort,
)
// Write the CA certificates to the container
@@ -932,6 +949,13 @@ func (t *HeadscaleInContainer) GetPort() string {
return strconv.Itoa(t.port)
}
// GetHostMetricsPort returns the dynamically assigned host port for metrics/pprof access.
// This port can be used by operators to access metrics at http://localhost:{port}/metrics
// and debug endpoints at http://localhost:{port}/debug/ while tests are running.
func (t *HeadscaleInContainer) GetHostMetricsPort() string {
return t.hostMetricsPort
}
// GetHealthEndpoint returns a health endpoint for the HeadscaleInContainer
// instance.
func (t *HeadscaleInContainer) GetHealthEndpoint() string {

View File

@@ -247,9 +247,14 @@ func (s *Scenario) AddNetwork(name string) (*dockertest.Network, error) {
// We run the test suite in a docker container that calls a couple of endpoints for
// readiness checks, this ensures that we can run the tests with individual networks
// and have the client reach the different containers
// TODO(kradalby): Can the test-suite be renamed so we can have multiple?
err = dockertestutil.AddContainerToNetwork(s.pool, network, "headscale-test-suite")
// and have the client reach the different containers.
// The container name includes the run ID to support multiple concurrent test runs.
testSuiteName := "headscale-test-suite"
if runID := dockertestutil.GetIntegrationRunID(); runID != "" {
testSuiteName = "headscale-test-suite-" + runID
}
err = dockertestutil.AddContainerToNetwork(s.pool, network, testSuiteName)
if err != nil {
return nil, fmt.Errorf("failed to add test suite container to network: %w", err)
}

View File

@@ -307,7 +307,18 @@ func New(
return nil, err
}
hostname := fmt.Sprintf("ts-%s-%s", strings.ReplaceAll(version, ".", "-"), hash)
// Include run ID in hostname for easier identification of which test run owns this container
runID := dockertestutil.GetIntegrationRunID()
var hostname string
if runID != "" {
// Use last 6 chars of run ID (the random hash part) for brevity
runIDShort := runID[len(runID)-6:]
hostname = fmt.Sprintf("ts-%s-%s-%s", runIDShort, strings.ReplaceAll(version, ".", "-"), hash)
} else {
hostname = fmt.Sprintf("ts-%s-%s", strings.ReplaceAll(version, ".", "-"), hash)
}
tsic := &TailscaleInContainer{
version: version,