[Bug] Enabling again a fallback node redundant routes made both main and fallback node routes primary=false #1091

New Issue

adam · 2025-12-29T02:28:13+01:00

adam commented

2025-12-29 02:28:13 +01:00

Originally created by @rgarrigue on GitHub (Aug 27, 2025).

Is this a support request?

This is not a support request

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

I have 2 nodes with identical routes, the "main" and the "fallback", for redundancy purpose. They were properly routing traffic to a database,, the "main" routes being listed with a primary=true, and the "fallback" with primary=false.

I'm adding one more node, with routes to the main & fallback. Out of laziness I ran a for loop to enable all the routes, including already enabled ones. Then we lost connections through the VPN to the database. As shown in the screenshot the "main" routes 1️⃣ and "fallback" routes 2️⃣ have primary=false.

I disabled the "fallback" routes 1️⃣ 2️⃣ but the "main" routes didn't turned primary=true 3️⃣

I enabled the "main" routes 1️⃣ and it did turned the its routes to primary=true 2️⃣ , connection to the database was back ✔️

After that I enabled the "fallback" routes and the primary stayed true on the "main" and false on "fallback"

Expected Behavior

I expect for the traffic / routing to go on if I enable already enabled routes. In other word, enabling already enabled should be idempotent. Even if it's a seemingly useless things to do.

Steps To Reproduce

Add two tailscale with identical routes
Enable them both
Check the state of the routes, one should be primary=true the other primary=false
Enable them both again
Check the state of the routes, both should be primary=false

Environment

"main" instance (which also happens to run headscale)
- OS: Debian 11
- Headscale version: 0.25.1
- Tailscale version: 1.80.2

"fallback" instance (which runs tailscale only)
- OS: Debian 13
- Tailscale version: 1.86.2

Runtime environment

Headscale is behind a (reverse) proxy
Headscale runs in a container

Debug information

There are no policy on my headscale.

All clients lost & retrieved connections so I believe it ain't about the clients, and there are a lot of sensitive information in the tailscales dump, so I'll provide them only if deemed really relevant.

Headscale configuration

server_url: https://vpn.REDACTED
listen_addr: 0.0.0.0:443
metrics_listen_addr: 127.0.0.1:9090 # No monitoring
grpc_listen_addr: 127.0.0.1:50443 # No remote CLI
grpc_allow_insecure: false
noise:
  private_key_path: /var/lib/headscale/noise_private.key
prefixes:
  v6: fd7a:115c:a1e0::/48
  v4: 100.64.0.0/10
  allocation: random
derp:
  server:
    enabled: false
    region_id: 999
    region_code: "headscale"
    region_name: "Headscale Embedded DERP"
    stun_listen_addr: "0.0.0.0:3478"
    private_key_path: /var/lib/headscale/derp_server_private.key
    automatically_add_embedded_derp_region: true
    ipv4: 1.2.3.4
    ipv6: 2001:db8::1
  urls:
    - https://controlplane.tailscale.com/derpmap/default
  paths: []
  auto_update_enabled: true
  update_frequency: 24h
disable_check_updates: true
ephemeral_node_inactivity_timeout: 30m
database:
  type: sqlite
  debug: false
  gorm:
    prepare_stmt: true
    parameterized_queries: true
    skip_err_record_not_found: true
    slow_threshold: 1000
  sqlite:
    path: /var/lib/headscale/db.sqlite
    write_ahead_log: true
acme_url: https://acme-v02.api.letsencrypt.org/directory
acme_email: "ops@REDACTED"
tls_letsencrypt_hostname: "vpn.REDACTED"
tls_letsencrypt_cache_dir: /var/lib/headscale/cache
tls_letsencrypt_challenge_type: HTTP-01
tls_letsencrypt_listen: ":http"
log:
  format: text
  level: info
policy:
  mode: file
  path: ""
dns:
  magic_dns: true
  base_domain: REDACTED.internal
  nameservers:
    global:
      - 9.9.9.9
      - 149.112.112.112
      - 2620:fe::fe
      - 2620:fe::9
    split:
      {}
  search_domains: []
unix_socket: /var/run/headscale/headscale.sock
unix_socket_permission: "0770"
oidc:
  only_start_if_oidc_is_available: true
  issuer: "https://accounts.google.com"
  client_id: "REDACTED.apps.googleusercontent.com"
  client_secret: "GOCSPX-REDACTED"
  expiry: 30d
  use_expiry_from_token: false
  scope: ["openid", "profile", "email"]
  extra_params:
    domain_hint: REDACTED
  allowed_domains:
    - REDACTED
  allowed_groups: []
  allowed_users: []
  strip_email_domain: true
logtail:
  enabled: false
randomize_client_port: false # default static port 41641

Originally created by @rgarrigue on GitHub (Aug 27, 2025). ### Is this a support request? - [x] This is not a support request ### Is there an existing issue for this? - [x] I have searched the existing issues ### Current Behavior I have 2 nodes with identical routes, the "main" and the "fallback", for redundancy purpose. They were properly routing traffic to a database,, the "main" routes being listed with a primary=true, and the "fallback" with primary=false. I'm adding one more node, with routes to the main & fallback. Out of laziness I ran a for loop to enable all the routes, including already enabled ones. Then we lost connections through the VPN to the database. As shown in the screenshot the "main" routes 1️⃣ and "fallback" routes 2️⃣ have primary=false. <img width="988" height="478" alt="Image" src="https://github.com/user-attachments/assets/00dd4bb7-67b1-4838-8806-e5224f1c8706" /> I disabled the "fallback" routes 1️⃣ 2️⃣ but the "main" routes didn't turned primary=true 3️⃣ <img width="981" height="254" alt="Image" src="https://github.com/user-attachments/assets/fa8bca23-47e9-4c7e-ba13-3118eed2746e" /> I enabled the "main" routes 1️⃣ and it did turned the its routes to primary=true 2️⃣ , connection to the database was back ✔️ <img width="798" height="258" alt="Image" src="https://github.com/user-attachments/assets/3614fefa-1940-420d-baf0-f95bc9f05ddd" /> After that I enabled the "fallback" routes and the primary stayed true on the "main" and false on "fallback" ### Expected Behavior I expect for the traffic / routing to go on if I enable already enabled routes. In other word, enabling already enabled should be idempotent. Even if it's a seemingly useless things to do. ### Steps To Reproduce 1. Add two tailscale with identical routes 2. Enable them both 3. Check the state of the routes, one should be primary=true the other primary=false 4. Enable them both again 5. Check the state of the routes, both should be primary=false ### Environment ```markdown "main" instance (which also happens to run headscale) - OS: Debian 11 - Headscale version: 0.25.1 - Tailscale version: 1.80.2 "fallback" instance (which runs tailscale only) - OS: Debian 13 - Tailscale version: 1.86.2 ``` ### Runtime environment - [ ] Headscale is behind a (reverse) proxy - [ ] Headscale runs in a container ### Debug information There are no policy on my headscale. All clients lost & retrieved connections so I believe it ain't about the clients, and there are a lot of sensitive information in the tailscales dump, so I'll provide them only if deemed really relevant. Headscale configuration ```yaml server_url: https://vpn.REDACTED listen_addr: 0.0.0.0:443 metrics_listen_addr: 127.0.0.1:9090 # No monitoring grpc_listen_addr: 127.0.0.1:50443 # No remote CLI grpc_allow_insecure: false noise: private_key_path: /var/lib/headscale/noise_private.key prefixes: v6: fd7a:115c:a1e0::/48 v4: 100.64.0.0/10 allocation: random derp: server: enabled: false region_id: 999 region_code: "headscale" region_name: "Headscale Embedded DERP" stun_listen_addr: "0.0.0.0:3478" private_key_path: /var/lib/headscale/derp_server_private.key automatically_add_embedded_derp_region: true ipv4: 1.2.3.4 ipv6: 2001:db8::1 urls: - https://controlplane.tailscale.com/derpmap/default paths: [] auto_update_enabled: true update_frequency: 24h disable_check_updates: true ephemeral_node_inactivity_timeout: 30m database: type: sqlite debug: false gorm: prepare_stmt: true parameterized_queries: true skip_err_record_not_found: true slow_threshold: 1000 sqlite: path: /var/lib/headscale/db.sqlite write_ahead_log: true acme_url: https://acme-v02.api.letsencrypt.org/directory acme_email: "ops@REDACTED" tls_letsencrypt_hostname: "vpn.REDACTED" tls_letsencrypt_cache_dir: /var/lib/headscale/cache tls_letsencrypt_challenge_type: HTTP-01 tls_letsencrypt_listen: ":http" log: format: text level: info policy: mode: file path: "" dns: magic_dns: true base_domain: REDACTED.internal nameservers: global: - 9.9.9.9 - 149.112.112.112 - 2620:fe::fe - 2620:fe::9 split: {} search_domains: [] unix_socket: /var/run/headscale/headscale.sock unix_socket_permission: "0770" oidc: only_start_if_oidc_is_available: true issuer: "https://accounts.google.com" client_id: "REDACTED.apps.googleusercontent.com" client_secret: "GOCSPX-REDACTED" expiry: 30d use_expiry_from_token: false scope: ["openid", "profile", "email"] extra_params: domain_hint: REDACTED allowed_domains: - REDACTED allowed_groups: [] allowed_users: [] strip_email_domain: true logtail: enabled: false randomize_client_port: false # default static port 41641 ```

adam added the bug label 2025-12-29 02:28:13 +01:00

adam closed this issue

2025-12-29 02:28:13 +01:00

adam commented

2025-12-29 02:28:13 +01:00

@nblock commented on GitHub (Aug 27, 2025):

Please try to reproduce with the current version of Headscale: 0.26.1.

@nblock commented on GitHub (Aug 27, 2025): Please try to reproduce with the current version of Headscale: 0.26.1.

adam commented

2025-12-29 02:28:13 +01:00

@rgarrigue commented on GitHub (Aug 29, 2025):

Hello,

It seems the issue is gone with 0.26.1

Sorry for not trying it out first.

Thanks for Headscale :-)

@rgarrigue commented on GitHub (Aug 29, 2025): Hello, It seems the issue is gone with 0.26.1 <img width="1845" height="437" alt="Image" src="https://github.com/user-attachments/assets/b5c31ff8-5bb0-4fd3-a2cc-a42db31bc595" /> Sorry for not trying it out first. Thanks for Headscale :-)

adam referenced this issue

2025-12-29 02:32:03 +01:00

[PR #1091] [CLOSED] docs(README): update contributors #1870

Sign in to join this conversation.

Branches Tags

main

gh-pages

update_flake_lock_action

kradalby/3038-reg-panic

kradalby/release-v0.27.2

dependabot/go_modules/golang.org/x/crypto-0.45.0

dependabot/go_modules/github.com/opencontainers/runc-1.3.3

copilot/investigate-headscale-issue-2788

copilot/investigate-visibility-issue-2788

copilot/investigate-issue-2833

copilot/debug-issue-2846

copilot/fix-issue-2847

dependabot/go_modules/github.com/go-viper/mapstructure/v2-2.4.0

dependabot/go_modules/github.com/docker/docker-28.3.3incompatible

kradalby/cli-experiement3

doc/0.26.1

doc/0.25.1

doc/0.25.0

doc/0.24.3

doc/0.24.2

doc/0.24.1

doc/0.24.0

kradalby/build-docker-on-pr

topic/docu-versioning

topic/docker-kos

juanfont/fix-crash-node-id

juanfont/better-disclaimer

update-contributors

topic/prettier

revert-1893-add-test-stage-to-docs

add-test-stage-to-docs

remove-node-check-interval

fix-empty-prefix

fix-ephemeral-reusable

bug_report-debuginfo

autogroups

logs-to-stderr

revert-1414-topic/fix_unix_socket

rename-machine-node

port-embedded-derp-tests-v2

port-derp-tests

duplicate-word-linter

update-tailscale-1.36

warn-against-apache

ko-fi-link

more-acl-tests

fix-typo-standalone

parallel-nolint

tparallel-fix

rerouting

ssh-changelog-docs

oidc-cleanup

web-auth-flow-tests

kradalby-gh-runner

fix-proto-lint

remove-funding-links

go-1.19

enable-1.30-in-tests

0.16.x

cosmetic-changes-integration

tmp-fix-integration-docker

fix-integration-docker

configurable-update-interval

show-nodes-online

hs2021

acl-syntax-fixes

ts2021-implementation

fix-spurious-updates

unstable-integration-tests

mandatory-stun

embedded-derp

prtemplate-fix

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/headscale#1091