[BUG] Lose Update to 0.23.0 (lost mtu) #861

Closed
opened 2025-12-29 02:24:57 +01:00 by adam · 5 comments
Owner

Originally created by @pstvasko on GitHub (Nov 21, 2024).

Is this a support request?

  • This is not a support request

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

The tunnel breaks on gate1.

Expected Behavior

Dont lost traffic

Steps To Reproduce

Hi. After updating to version 23, there is an issue in the Tailscale network.
I have a complex network connecting two Tailscale installations:
100.64.0.0 - headscale1 - gate1 - gate2 - headscale2 - 100.80.0.0

When I download between 100.64.0.0 and 100.80.0.0, the speed reaches a maximum of 2 Gbps, and some issues start occurring. Packets stop flowing on the segment 100.64.0.0 - headscale1 (although if I reduce the MTU to 932, pings work).

There are about 500 clients in the network. Could you advise in which direction I should investigate?

Environment

- OS: AlmaLinux8
- Headscale version: 0.23.0
- Tailscale version: 1.76.6

Runtime environment

  • Headscale is behind a (reverse) proxy
  • Headscale runs in a container

Anything else?

Headscale

2024-11-21T21:31:33Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771
2024-11-21T21:31:35Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771
2024-11-21T21:31:37Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771
2024-11-21T21:31:38Z ERR Failed to fetch node from the database with node key: nodekey:10715a5defd407c11146b436449e3fdc771d8e4adc68b8dac0077e5e3d64d370 handler=NoisePollNetMap
2024-11-21T21:31:39Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771
2024-11-21T21:31:40Z INF home/runner/work/headscale/headscale/hscontrol/auth_noise.go:44 > unsupported client connected client_version=58 min_version=61
2024-11-21T21:31:41Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771
2024-11-21T21:31:41Z INF home/runner/work/headscale/headscale/hscontrol/auth_noise.go:44 > unsupported client connected client_version=58 min_version=61
2024-11-21T21:31:42Z INF home/runner/work/headscale/headscale/hscontrol/auth.go:28 > Successfully sent auth url: https://headscale.*****/oidc/register/mkey:ad30ca2d2f62ca426624930d6455211e40554add598c1a99420ffc8e6a2d8c0c expiry=-62135596800 followup=https://headscale.*****/oidc/register/mkey:ad30ca2d2f62ca426624930d6455211e40554add598c1a99420ffc8e6a2d8c0c machine_key=[rTDKL] node=vm-po4 node_key=[QxxVm] node_key_old=[bYjMr]
2024-11-21T21:31:43Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771
2024-11-21T21:31:45Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771```

tailscale:

```22 00:31:10 tailscaled[1378041]: wgengine: idle peer [Jpvng] now active, reconfiguring WireGuard 
22 00:31:10 tailscaled[1378041]: wgengine: Reconfig: configuring userspace WireGuard config (with 70/459 peers) 
22 00:31:20 tailscaled[1378041]: wgengine: Reconfig: configuring userspace WireGuard config (with 69/459 peers) 
22 00:31:35 tailscaled[1378041]: open-conn-track: flow TCP (TCP 10.10.0.105:44438 => 100.80.0.2:9188) got RST by peer 
22 00:31:38 tailscaled[1378041]: open-conn-track: flow TCP (TCP 10.10.0.105:55990 => 100.80.0.2:9187) got RST by peer 
22 00:31:38 tailscaled[1378041]: control: NetInfo: NetInfo{varies=false hairpin= ipv6=false ipv6os=false udp=true icmpv4=false derp=#999 portmap= link="" firewallmode="ipt-default"} 
22 00:31:53 tailscaled[1378041]: wgengine: idle peer [Qif6f] now active, reconfiguring WireGuard 
22 00:31:53 tailscaled[1378041]: wgengine: Reconfig: configuring userspace WireGuard config (with 69/459 peers) 
22 00:32:05 tailscaled[1378041]: open-conn-track: flow TCP (TCP 10.10.0.105:36450 => 100.80.0.2:9188) got RST by peer 
22 00:32:08 tailscaled[1378041]: open-conn-track: flow TCP (TCP 10.10.0.105:43332 => 100.80.0.2:9187) got RST by peer 
22 00:32:10 tailscaled[1378041]: wgengine: idle peer [qoqFH] now active, reconfiguring WireGuard 
22 00:32:10 tailscaled[1378041]: wgengine: Reconfig: configuring userspace WireGuard config (with 70/459 peers)```
Originally created by @pstvasko on GitHub (Nov 21, 2024). ### Is this a support request? - [X] This is not a support request ### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior The tunnel breaks on gate1. ### Expected Behavior Dont lost traffic ### Steps To Reproduce Hi. After updating to version 23, there is an issue in the Tailscale network. I have a complex network connecting two Tailscale installations: 100.64.0.0 - headscale1 - gate1 - gate2 - headscale2 - 100.80.0.0 When I download between 100.64.0.0 and 100.80.0.0, the speed reaches a maximum of 2 Gbps, and some issues start occurring. Packets stop flowing on the segment 100.64.0.0 - headscale1 (although if I reduce the MTU to 932, pings work). There are about 500 clients in the network. Could you advise in which direction I should investigate? ### Environment ```markdown - OS: AlmaLinux8 - Headscale version: 0.23.0 - Tailscale version: 1.76.6 ``` ### Runtime environment - [ ] Headscale is behind a (reverse) proxy - [X] Headscale runs in a container ### Anything else? Headscale ```2024-11-21T21:31:32Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771 2024-11-21T21:31:33Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771 2024-11-21T21:31:35Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771 2024-11-21T21:31:37Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771 2024-11-21T21:31:38Z ERR Failed to fetch node from the database with node key: nodekey:10715a5defd407c11146b436449e3fdc771d8e4adc68b8dac0077e5e3d64d370 handler=NoisePollNetMap 2024-11-21T21:31:39Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771 2024-11-21T21:31:40Z INF home/runner/work/headscale/headscale/hscontrol/auth_noise.go:44 > unsupported client connected client_version=58 min_version=61 2024-11-21T21:31:41Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771 2024-11-21T21:31:41Z INF home/runner/work/headscale/headscale/hscontrol/auth_noise.go:44 > unsupported client connected client_version=58 min_version=61 2024-11-21T21:31:42Z INF home/runner/work/headscale/headscale/hscontrol/auth.go:28 > Successfully sent auth url: https://headscale.*****/oidc/register/mkey:ad30ca2d2f62ca426624930d6455211e40554add598c1a99420ffc8e6a2d8c0c expiry=-62135596800 followup=https://headscale.*****/oidc/register/mkey:ad30ca2d2f62ca426624930d6455211e40554add598c1a99420ffc8e6a2d8c0c machine_key=[rTDKL] node=vm-po4 node_key=[QxxVm] node_key_old=[bYjMr] 2024-11-21T21:31:43Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771 2024-11-21T21:31:45Z ERR update not sent, context cancelled error="context deadline exceeded" node.id=771``` tailscale: ```22 00:31:10 tailscaled[1378041]: wgengine: idle peer [Jpvng] now active, reconfiguring WireGuard 22 00:31:10 tailscaled[1378041]: wgengine: Reconfig: configuring userspace WireGuard config (with 70/459 peers) 22 00:31:20 tailscaled[1378041]: wgengine: Reconfig: configuring userspace WireGuard config (with 69/459 peers) 22 00:31:35 tailscaled[1378041]: open-conn-track: flow TCP (TCP 10.10.0.105:44438 => 100.80.0.2:9188) got RST by peer 22 00:31:38 tailscaled[1378041]: open-conn-track: flow TCP (TCP 10.10.0.105:55990 => 100.80.0.2:9187) got RST by peer 22 00:31:38 tailscaled[1378041]: control: NetInfo: NetInfo{varies=false hairpin= ipv6=false ipv6os=false udp=true icmpv4=false derp=#999 portmap= link="" firewallmode="ipt-default"} 22 00:31:53 tailscaled[1378041]: wgengine: idle peer [Qif6f] now active, reconfiguring WireGuard 22 00:31:53 tailscaled[1378041]: wgengine: Reconfig: configuring userspace WireGuard config (with 69/459 peers) 22 00:32:05 tailscaled[1378041]: open-conn-track: flow TCP (TCP 10.10.0.105:36450 => 100.80.0.2:9188) got RST by peer 22 00:32:08 tailscaled[1378041]: open-conn-track: flow TCP (TCP 10.10.0.105:43332 => 100.80.0.2:9187) got RST by peer 22 00:32:10 tailscaled[1378041]: wgengine: idle peer [qoqFH] now active, reconfiguring WireGuard 22 00:32:10 tailscaled[1378041]: wgengine: Reconfig: configuring userspace WireGuard config (with 70/459 peers)```
adam added the stalebug labels 2025-12-29 02:24:57 +01:00
adam closed this issue 2025-12-29 02:24:57 +01:00
Author
Owner

@kradalby commented on GitHub (Nov 22, 2024):

Have you changed the Tailscale version between these nodes recently? I am not rolling out that it would be some parameter changed in Headscale, but I am surprised if that change would have any real impact on the client side, changes on the client would be more expected if I was to guess.

Can you try with multiple Tailscale versions vs multiple Headscale versions?

@kradalby commented on GitHub (Nov 22, 2024): Have you changed the Tailscale version between these nodes recently? I am not rolling out that it would be some parameter changed in Headscale, but I am surprised if that change would have any real impact on the client side, changes on the client would be more expected if I was to guess. Can you try with multiple Tailscale versions vs multiple Headscale versions?
Author
Owner

@pstvasko commented on GitHub (Nov 26, 2024):

I tested all possible versions supported by 0.23.
Additionally, the connection is restored after reconnecting to the network.

I’m having issues with Tailscale. Specifically, after reaching a speed of 2 Gbps, the connection drops after about a minute and doesn’t recover until I restart Telnet itself.

At the same time, I can ping the node like this:
ping -s 900 100.64.0.1

But not like this:
ping -s 1400 100.64.0.1

I have around 500 clients and a complex network between data centers.

image

@pstvasko commented on GitHub (Nov 26, 2024): I tested all possible versions supported by 0.23. Additionally, the connection is restored after reconnecting to the network. I’m having issues with Tailscale. Specifically, after reaching a speed of 2 Gbps, the connection drops after about a minute and doesn’t recover until I restart Telnet itself. At the same time, I can ping the node like this: ping -s 900 100.64.0.1 But not like this: ping -s 1400 100.64.0.1 I have around 500 clients and a complex network between data centers. ![image](https://github.com/user-attachments/assets/aede9382-1402-4c70-af07-233930bc9e0c)
Author
Owner

@github-actions[bot] commented on GitHub (Feb 25, 2025):

This issue is stale because it has been open for 90 days with no activity.

@github-actions[bot] commented on GitHub (Feb 25, 2025): This issue is stale because it has been open for 90 days with no activity.
Author
Owner

@github-actions[bot] commented on GitHub (Mar 4, 2025):

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions[bot] commented on GitHub (Mar 4, 2025): This issue was closed because it has been inactive for 14 days since being marked as stale.
Author
Owner

@andreyrd commented on GitHub (Jun 12, 2025):

Seeing something like this. As soon as I start seeing the "context deadline exceeded" errors in my Headscale logs, I'm almost guaranteed to start having connectivity issues between nodes, but "timeout" instead of "RST".

@andreyrd commented on GitHub (Jun 12, 2025): Seeing something like this. As soon as I start seeing the "context deadline exceeded" errors in my Headscale logs, I'm almost guaranteed to start having connectivity issues between nodes, but "timeout" instead of "RST".
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#861