Headscale v0.27.0 not storing/propagating node endpoints - all connections forced through DERP relay #1134

Closed
opened 2025-12-29 02:28:28 +01:00 by adam · 5 comments
Owner

Originally created by @bradgarrison on GitHub (Nov 1, 2025).

Is this a support request?

  • This is not a support request

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Current Behavior
Headscale v0.27.0 is not storing or propagating endpoint information to peers, causing all connections to relay through DERP servers instead of establishing direct peer-to-peer connections. This occurs even when nodes are on the same local subnet.
Evidence:

Headscale database shows null endpoints:

$ sudo headscale nodes list --output json | jq '.[] | {name: .name, endpoints: .endpoints}'
{
  "name": "jump",
  "endpoints": null
}
{
  "name": "dennis-pc",
  "endpoints": null
}

Clients are advertising their endpoints to Headscale:

From dennis-pc

"Self": {
  "Addrs": [
    "107.5.7.157:55181",
    "192.168.60.58:55181"
  ]
}

From jump

"Self": {
  "Addrs": [
    "107.5.7.157:49694",
    "192.168.60.60:49694",
    "192.168.70.44:49694",
    "192.168.100.53:49694"
  ]
}

But peers receive null endpoint data:

dennis-pc's view of jump:

"Peer": {
  "nodekey:...": {
    "HostName": "Jump",
    "Addrs": null,  // ← Should contain jump's endpoints
    "Relay": "ord"
  }
}

jump's view of dennis-pc

"Peer": {
  "nodekey:...": {
    "HostName": "Dennis-PC",
    "Addrs": null,  // ← Should contain dennis-pc's endpoints
    "Relay": "ord"
  }
}

Result: Forced DERP relay despite being on same subnet:

# Both machines on 192.168.60.0/24
$ ping 192.168.60.58
Reply from 192.168.60.58: time=2ms  # Direct local ping works

$ tailscale status
100.64.0.1  jump       jbtech@        windows  active; relay "ord", tx 572 rx 476
# ↑ Relaying through Chicago despite being 2ms away

Expected Behavior

Expected Behavior

Headscale should store endpoint information in the database
Peers should receive each other's endpoint lists in the network map (Addrs field populated)
Direct connections should be established when possible (especially on same subnet)
DERP relay should only be used as fallback when direct connection fails

Steps To Reproduce

Steps To Reproduce

Install Headscale v0.27.0 on Ubuntu 24.04
Configure with the following relevant settings:

yamlserver_url: https://headscale1.jbtech.me
randomize_client_port: true
derp:
  server:
    enabled: true
    region_id: 999
    stun_listen_addr: "0.0.0.0:3478"
  urls:
    - https://controlplane.tailscale.com/derpmap/default
policy:
  mode: file
  path: /etc/headscale/policy.json

Create ACL policy allowing all communication:

{
  "acls": [
    {
      "action": "accept",
      "src": ["*"],
      "dst": ["*:*"]
    }
  ]
}

Register two Windows nodes on the same subnet (Tailscale v1.90.4):

tailscale up --login-server=https://headscale1.jbtech.me --authkey=<preauth-key>

Verify nodes see each other:

tailscale status
100.64.0.1  jump       jbtech@        windows  -
100.64.0.2  dennis-pc  williamellis@  windows  active; relay "ord"

Check Headscale database for endpoints:

sudo headscale nodes list --output json | jq '.[] | {name: .name, endpoints: .endpoints}'
# Shows "endpoints": null for both nodes

Check client JSON status:

tailscale status --json | Out-File status.json
# Shows Peer.Addrs = null despite Self.Addrs being populated

Environment

  • OS: Ubuntu 24.04 LTS (Headscale server), Windows 11 (clients)
  • Headscale version: v0.27.0
  • Tailscale client version: 1.90.4-t0d7298602-g1c96c3ed9
  • Network topology: Both clients on same physical subnet (192.168.60.0/24)

Runtime environment

  • Headscale is behind a (reverse) proxy
  • Headscale runs in a container

Debug information

Additional Context
This appears to be a regression - we have a production Tailscale network (official coordination server) with 60+ nodes that establishes direct connections correctly after applying randomizeClientPort: true to the ACL policy. The same configuration applied to Headscale does not propagate endpoints.

Comparison:

  • Production Tailscale: Direct connections work, Peer.Addrs populated with endpoint list
  • Headscale v0.27.0: All DERP relay, Peer.Addrs = null, database shows endpoints: null

No endpoint updates in logs:

sudo journalctl -u headscale --since "5 minutes ago" | grep -i "endpoint"
# No output - endpoints not being received/stored

Logs show nodes connecting successfully:

2025-10-31T20:42:24-04:00 INF Node connected node.id=3 node.name=jump
2025-10-31T20:42:24-04:00 INF Node connected node.id=4 node.name=dennis-pc

This bug effectively breaks peer-to-peer connectivity and forces all traffic through DERP relays, significantly degrading performance (30ms+ latency instead of <5ms for same-subnet connections).


EDIT(nblock): added code blocks to improve readability.

Originally created by @bradgarrison on GitHub (Nov 1, 2025). ### Is this a support request? - [x] This is not a support request ### Is there an existing issue for this? - [x] I have searched the existing issues ### Current Behavior Current Behavior Headscale v0.27.0 is not storing or propagating endpoint information to peers, causing all connections to relay through DERP servers instead of establishing direct peer-to-peer connections. This occurs even when nodes are on the same local subnet. Evidence: Headscale database shows null endpoints: ```console $ sudo headscale nodes list --output json | jq '.[] | {name: .name, endpoints: .endpoints}' { "name": "jump", "endpoints": null } { "name": "dennis-pc", "endpoints": null } ``` Clients are advertising their endpoints to Headscale: From dennis-pc ```json "Self": { "Addrs": [ "107.5.7.157:55181", "192.168.60.58:55181" ] } ``` From jump ```json "Self": { "Addrs": [ "107.5.7.157:49694", "192.168.60.60:49694", "192.168.70.44:49694", "192.168.100.53:49694" ] } ``` But peers receive null endpoint data: dennis-pc's view of jump: ```json "Peer": { "nodekey:...": { "HostName": "Jump", "Addrs": null, // ← Should contain jump's endpoints "Relay": "ord" } } ``` jump's view of dennis-pc ```json "Peer": { "nodekey:...": { "HostName": "Dennis-PC", "Addrs": null, // ← Should contain dennis-pc's endpoints "Relay": "ord" } } ``` Result: Forced DERP relay despite being on same subnet: ```console # Both machines on 192.168.60.0/24 $ ping 192.168.60.58 Reply from 192.168.60.58: time=2ms # Direct local ping works $ tailscale status 100.64.0.1 jump jbtech@ windows active; relay "ord", tx 572 rx 476 # ↑ Relaying through Chicago despite being 2ms away ``` ### Expected Behavior Expected Behavior Headscale should store endpoint information in the database Peers should receive each other's endpoint lists in the network map (Addrs field populated) Direct connections should be established when possible (especially on same subnet) DERP relay should only be used as fallback when direct connection fails ### Steps To Reproduce Steps To Reproduce Install Headscale v0.27.0 on Ubuntu 24.04 Configure with the following relevant settings: ```yaml yamlserver_url: https://headscale1.jbtech.me randomize_client_port: true derp: server: enabled: true region_id: 999 stun_listen_addr: "0.0.0.0:3478" urls: - https://controlplane.tailscale.com/derpmap/default policy: mode: file path: /etc/headscale/policy.json ``` Create ACL policy allowing all communication: ```json { "acls": [ { "action": "accept", "src": ["*"], "dst": ["*:*"] } ] } ``` Register two Windows nodes on the same subnet (Tailscale v1.90.4): ```console tailscale up --login-server=https://headscale1.jbtech.me --authkey=<preauth-key> ``` Verify nodes see each other: ```console tailscale status 100.64.0.1 jump jbtech@ windows - 100.64.0.2 dennis-pc williamellis@ windows active; relay "ord" ``` Check Headscale database for endpoints: ```console sudo headscale nodes list --output json | jq '.[] | {name: .name, endpoints: .endpoints}' # Shows "endpoints": null for both nodes ``` Check client JSON status: ```console tailscale status --json | Out-File status.json # Shows Peer.Addrs = null despite Self.Addrs being populated ``` ### Environment - **OS:** Ubuntu 24.04 LTS (Headscale server), Windows 11 (clients) - **Headscale version:** v0.27.0 - **Tailscale client version:** 1.90.4-t0d7298602-g1c96c3ed9 - **Network topology:** Both clients on same physical subnet (192.168.60.0/24) ### Runtime environment - [ ] Headscale is behind a (reverse) proxy - [ ] Headscale runs in a container ### Debug information Additional Context This appears to be a regression - we have a production Tailscale network (official coordination server) with 60+ nodes that establishes direct connections correctly after applying `randomizeClientPort: true` to the ACL policy. The same configuration applied to Headscale does not propagate endpoints. Comparison: - Production Tailscale: Direct connections work, Peer.Addrs populated with endpoint list - Headscale v0.27.0: All DERP relay, Peer.Addrs = null, database shows endpoints: null No endpoint updates in logs: ```console sudo journalctl -u headscale --since "5 minutes ago" | grep -i "endpoint" # No output - endpoints not being received/stored ``` **Logs show nodes connecting successfully:** ``` 2025-10-31T20:42:24-04:00 INF Node connected node.id=3 node.name=jump 2025-10-31T20:42:24-04:00 INF Node connected node.id=4 node.name=dennis-pc ``` This bug effectively breaks peer-to-peer connectivity and forces all traffic through DERP relays, significantly degrading performance (30ms+ latency instead of <5ms for same-subnet connections). --- EDIT(nblock): added code blocks to improve readability.
adam added the questionbug labels 2025-12-29 02:28:28 +01:00
adam closed this issue 2025-12-29 02:28:29 +01:00
Author
Owner

@nblock commented on GitHub (Nov 1, 2025):

Headscale database shows null endpoints

There's no endpoints field in Headscale's node status. But the information is stored in the database, can you please provide the output of:

sqlite3 -readonly -cmd ".mode line" /path/to/db.sqlite 'SELECT hostname, given_name, endpoints, host_info FROM nodes'

Are you able to establish direct connections with randomize_client_port: false ?

@nblock commented on GitHub (Nov 1, 2025): > Headscale database shows null endpoints There's no `endpoints` field in Headscale's node status. But the information is stored in the database, can you please provide the output of: ```console sqlite3 -readonly -cmd ".mode line" /path/to/db.sqlite 'SELECT hostname, given_name, endpoints, host_info FROM nodes' ``` Are you able to establish direct connections with `randomize_client_port: false` ?
Author
Owner

@bradgarrison commented on GitHub (Nov 1, 2025):

Here's the PostgreSQL database output (using PostgreSQL instead of SQLite):

id | hostname | given_name | endpoints
----+-----------+------------+-----------
3 | jump | jump | ["107.5.7.157:49694","192.168.60.60:49694","192.168.70.44:49694","192.168.100.53:49694"]
4 | dennis-pc | dennis-pc | ["107.5.7.157:55181","192.168.60.58:55181"]

As you can see, the endpoints column DOES contain endpoint data in the database. However, when querying via the API or CLI (headscale nodes list --output json), the endpoints field returns null:

{
"name": "jump",
"endpoints": null
}
{
"name": "dennis-pc",
"endpoints": null
}

This confirms the issue is with the API/response serialization, not with endpoint storage in the database. The endpoints are stored correctly in PostgreSQL, but they're not being returned by the API, which prevents clients from receiving peer endpoint information in their network maps (resulting in null Peer.Addrs and forced DERP relay).

Additional diagnostic information:

Headscale version:
headscale version v0.27.0
commit: 450a7b15ec
build time: 2025-10-27T10:18:57Z
built with: go1.25.1 linux/amd64

PostgreSQL version: 15.14

Summary of the bug:

  • Clients are advertising endpoints to Headscale (visible in client status JSON)
  • Endpoints ARE being stored in the PostgreSQL database
  • API/CLI returns endpoints: null
  • Clients receive null Peer.Addrs in network map
  • Result: All connections forced through DERP relay

Regarding randomize_client_port: I haven't tested with this disabled yet, but can test if helpful for diagnosis.

@bradgarrison commented on GitHub (Nov 1, 2025): Here's the PostgreSQL database output (using PostgreSQL instead of SQLite): id | hostname | given_name | endpoints ----+-----------+------------+----------- 3 | jump | jump | ["107.5.7.157:49694","192.168.60.60:49694","192.168.70.44:49694","192.168.100.53:49694"] 4 | dennis-pc | dennis-pc | ["107.5.7.157:55181","192.168.60.58:55181"] As you can see, the endpoints column DOES contain endpoint data in the database. However, when querying via the API or CLI (headscale nodes list --output json), the endpoints field returns null: { "name": "jump", "endpoints": null } { "name": "dennis-pc", "endpoints": null } This confirms the issue is with the API/response serialization, not with endpoint storage in the database. The endpoints are stored correctly in PostgreSQL, but they're not being returned by the API, which prevents clients from receiving peer endpoint information in their network maps (resulting in null Peer.Addrs and forced DERP relay). Additional diagnostic information: Headscale version: headscale version v0.27.0 commit: 450a7b15ec7b08926738e308bd11ec17753d06ab build time: 2025-10-27T10:18:57Z built with: go1.25.1 linux/amd64 PostgreSQL version: 15.14 Summary of the bug: - Clients are advertising endpoints to Headscale (visible in client status JSON) - Endpoints ARE being stored in the PostgreSQL database - API/CLI returns endpoints: null - Clients receive null Peer.Addrs in network map - Result: All connections forced through DERP relay Regarding randomize_client_port: I haven't tested with this disabled yet, but can test if helpful for diagnosis.
Author
Owner

@nblock commented on GitHub (Nov 1, 2025):

However, when querying via the API or CLI (headscale nodes list --output json), the endpoints field returns null:

There's no endpoints field in the output of headscale nodes list --output json. You produced that yourself by piping it through jq.

Please provide the output of tailscale debug netmap for both nodes (in full, add as attachment)

Please use code blocks to improve readability, thx.

@nblock commented on GitHub (Nov 1, 2025): > However, when querying via the API or CLI (headscale nodes list --output json), the endpoints field returns null: There's no `endpoints` field in the output of `headscale nodes list --output json`. You produced that yourself by piping it through `jq`. Please provide the output of `tailscale debug netmap` for both nodes (in full, add as attachment) Please use [code blocks](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks) to improve readability, thx.
Author
Owner

@bradgarrison commented on GitHub (Nov 1, 2025):

UPDATE: After further testing, the direct connection issue appears to be intermittent and possibly machine-specific rather than a systematic Headscale problem.
Current findings:

Some Windows nodes on the same subnet establish direct connections successfully
Other Windows nodes (including the test machines in my original report) fail to establish direct connections and fall back to DERP relay
The issue persists even with production Tailscale coordination on the affected machines
Network configuration appears correct (firewall rules allow UDP, randomizeClientPort is enabled)

Next steps:
I'll conduct more thorough testing this week to:

Identify what configuration differences exist between working and non-working machines
Test with fresh VM builds to rule out machine-specific issues
Gather more diagnostic data (netcheck output, detailed logs) from both working and failing nodes

I'll update this issue with findings. The problem may not be Headscale-specific after all, but rather related to specific Windows client configurations or network conditions. Apologies for the premature report - I'll have more concrete data soon.

@bradgarrison commented on GitHub (Nov 1, 2025): UPDATE: After further testing, the direct connection issue appears to be intermittent and possibly machine-specific rather than a systematic Headscale problem. Current findings: Some Windows nodes on the same subnet establish direct connections successfully Other Windows nodes (including the test machines in my original report) fail to establish direct connections and fall back to DERP relay The issue persists even with production Tailscale coordination on the affected machines Network configuration appears correct (firewall rules allow UDP, randomizeClientPort is enabled) Next steps: I'll conduct more thorough testing this week to: Identify what configuration differences exist between working and non-working machines Test with fresh VM builds to rule out machine-specific issues Gather more diagnostic data (netcheck output, detailed logs) from both working and failing nodes I'll update this issue with findings. The problem may not be Headscale-specific after all, but rather related to specific Windows client configurations or network conditions. Apologies for the premature report - I'll have more concrete data soon.
Author
Owner

@nblock commented on GitHub (Nov 12, 2025):

No further details sent, closing for now.

@nblock commented on GitHub (Nov 12, 2025): No further details sent, closing for now.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#1134