Split DNS only works sporadically #388

Closed
opened 2025-12-29 01:28:10 +01:00 by adam · 1 comment
Owner

Originally created by @thorstenweber83 on GitHub (Dec 8, 2022).

Headscale version: 0.17.1
Platforms: Linux x86-64, aarch64 (some clients)
Tailscale versions: 1.32.2, 1.30.0
Kernel versions: 5.15.81 (hs server), 5.15.32, 5.19.5, 5.15.79 (ts clients)

So i have a headscale instance with the following config (generated via the headscale nixos module):

acl_policy_path: null
db_host: null
db_name: null
db_password_file: null
db_path: /var/lib/headscale/db.sqlite
db_port: null
db_type: sqlite3
db_user: null
derp:
  auto_update_enable: true
  paths: []
  update_frequency: 24h
  urls:
  - https://controlplane.tailscale.com/derpmap/default
disable_check_updates: true
dns_config:
  base_domain: ''
  domains: []
  magic_dns: true
  nameservers:
  - 1.1.1.1
  override_local_dns: true
  restricted_nameservers:
    my-domain.tld:
    - 100.64.0.5
ephemeral_node_inactivity_timeout: 30m
ip_prefixes:
- 100.64.0.0/10
listen_addr: 0.0.0.0:8081
log:
  format: text
  level: info
metrics_listen_addr: 0.0.0.0:9090
noise:
  private_key_path: /var/lib/headscale/noise_private.key
oidc:
  client_id: ''
  client_secret_file: null
  domain_map: {}
  issuer: ''
private_key_path: /var/lib/headscale/private.key
server_url: https://my-server.other-domain.tld
tls_cert_path: null
tls_key_path: null
tls_letsencrypt_cache_dir: /var/lib/headscale/.cache
tls_letsencrypt_challenge_type: HTTP-01
tls_letsencrypt_hostname: ''
tls_letsencrypt_listen: :http
unix_socket: /run/headscale/headscale.sock

It is served on https://my-server.other-domain.tld via a nginx reverse proxy.
My nameserver for split dns is reachable at 100.64.0.5:53 inside the tailnet and works when i do

dig @100.64.0.5 subdomain.my-domain.tld

The problem is that resolving my-domain.tld domains via 100.100.100.100 like this:

dig @100.100.100.100 subdomain.my-domain.tld

only works sometimes.
Often the tailscale resolver gives NXDOMAIN.

Looking at the tailscale client's logs i discovered, that when reconfiguring, it does not always include my dns server in Resolvercfg:

- - - - 8< - - - - - - - - - - - - - - - - -
wgengine: Reconfig: configuring router
Dez 06 23:56:45 client1 tailscaled[864]: wgengine: Reconfig: configuring DNS
Dez 06 23:56:45 client1 tailscaled[864]: dns: Set: {DefaultResolvers:[1.1.1.1] Routes:{my-domain.tld.:[100.64.0.5]}+65arpa SearchDomains:[my-domain.tld.] Hosts:5}
Dez 06 23:56:45 client1 tailscaled[864]: dns: Resolvercfg: {Routes:{.:[1.1.1.1] my-domain.tld.:[100.64.0.5]} Hosts:5 LocalDomains:[]+65arpa}
Dez 06 23:56:45 client1 tailscaled[864]: dns: OScfg: {Hosts:[] Nameservers:[100.100.100.100] SearchDomains:[my-domain.tld.] MatchDomains:[]}o
- - - - 8< - - now it's working  - - - - - -
[...]
- - - - 8< - - - - - - - - - - - - - - - - - 
Dez 06 23:57:35 client1 tailscaled[864]: wgengine: Reconfig: configuring router
Dez 06 23:57:35 client1 tailscaled[864]: wgengine: Reconfig: configuring DNS
Dez 06 23:57:35 client1 tailscaled[864]: dns: Set: {DefaultResolvers:[1.1.1.1] Routes:{my-domain.tld.:[]}+65arpa SearchDomains:[my-domain.tld.] Hosts:5}
Dez 06 23:57:35 client1 tailscaled[864]: dns: Resolvercfg: {Routes:{.:[1.1.1.1]} Hosts:5 LocalDomains:[my-domain.tld.]+65arpa}
Dez 06 23:57:35 client1 tailscaled[864]: dns: OScfg: {Hosts:[] Nameservers:[100.100.100.100] SearchDomains:[my-domain.tld.] MatchDomains:[]}
- - - - 8< - - now it's broken - - - - - - -

When looking at my headscale logs, i couldn't find any suspicious log messages. The only log lines during the time in question were:

Dez 06 23:56:19 server1 headscale-start[43971]: 2022-12-06T23:56:19+01:00 INF Client sent endpoint update and is ok with a response without peer list handler=PollNetMap machine=client1 noise=true
Dez 06 23:56:42 server1 headscale-start[43971]: 2022-12-06T23:56:42+01:00 INF Client sent endpoint update and is ok with a response without peer list handler=PollNetMap machine=client1 noise=true
Dez 06 23:57:30 server1 headscale-start[43971]: 2022-12-06T23:57:30+01:00 INF Client sent endpoint update and is ok with a response without peer list handler=PollNetMap machine=client1 noise=true

I'm not sure if this is an issue with headscale or the tailscale client, but i would like to rule out headscale before opening an issue with tailscale.

Any help is greatly appreciated.
Thank you very much!

Originally created by @thorstenweber83 on GitHub (Dec 8, 2022). Headscale version: 0.17.1 Platforms: Linux x86-64, aarch64 (some clients) Tailscale versions: 1.32.2, 1.30.0 Kernel versions: 5.15.81 (hs server), 5.15.32, 5.19.5, 5.15.79 (ts clients) So i have a headscale instance with the following config (generated via the headscale nixos module): ``` acl_policy_path: null db_host: null db_name: null db_password_file: null db_path: /var/lib/headscale/db.sqlite db_port: null db_type: sqlite3 db_user: null derp: auto_update_enable: true paths: [] update_frequency: 24h urls: - https://controlplane.tailscale.com/derpmap/default disable_check_updates: true dns_config: base_domain: '' domains: [] magic_dns: true nameservers: - 1.1.1.1 override_local_dns: true restricted_nameservers: my-domain.tld: - 100.64.0.5 ephemeral_node_inactivity_timeout: 30m ip_prefixes: - 100.64.0.0/10 listen_addr: 0.0.0.0:8081 log: format: text level: info metrics_listen_addr: 0.0.0.0:9090 noise: private_key_path: /var/lib/headscale/noise_private.key oidc: client_id: '' client_secret_file: null domain_map: {} issuer: '' private_key_path: /var/lib/headscale/private.key server_url: https://my-server.other-domain.tld tls_cert_path: null tls_key_path: null tls_letsencrypt_cache_dir: /var/lib/headscale/.cache tls_letsencrypt_challenge_type: HTTP-01 tls_letsencrypt_hostname: '' tls_letsencrypt_listen: :http unix_socket: /run/headscale/headscale.sock ``` It is served on https://my-server.other-domain.tld via a nginx reverse proxy.\ My nameserver for split dns is reachable at 100.64.0.5:53 inside the tailnet and works when i do ``` dig @100.64.0.5 subdomain.my-domain.tld ``` The problem is that resolving `my-domain.tld` domains via 100.100.100.100 like this: ``` dig @100.100.100.100 subdomain.my-domain.tld ``` only works sometimes.\ Often the tailscale resolver gives `NXDOMAIN`. Looking at the tailscale client's logs i discovered, that when reconfiguring, it does not always include my dns server in `Resolvercfg`: ``` - - - - 8< - - - - - - - - - - - - - - - - - wgengine: Reconfig: configuring router Dez 06 23:56:45 client1 tailscaled[864]: wgengine: Reconfig: configuring DNS Dez 06 23:56:45 client1 tailscaled[864]: dns: Set: {DefaultResolvers:[1.1.1.1] Routes:{my-domain.tld.:[100.64.0.5]}+65arpa SearchDomains:[my-domain.tld.] Hosts:5} Dez 06 23:56:45 client1 tailscaled[864]: dns: Resolvercfg: {Routes:{.:[1.1.1.1] my-domain.tld.:[100.64.0.5]} Hosts:5 LocalDomains:[]+65arpa} Dez 06 23:56:45 client1 tailscaled[864]: dns: OScfg: {Hosts:[] Nameservers:[100.100.100.100] SearchDomains:[my-domain.tld.] MatchDomains:[]}o - - - - 8< - - now it's working - - - - - - [...] - - - - 8< - - - - - - - - - - - - - - - - - Dez 06 23:57:35 client1 tailscaled[864]: wgengine: Reconfig: configuring router Dez 06 23:57:35 client1 tailscaled[864]: wgengine: Reconfig: configuring DNS Dez 06 23:57:35 client1 tailscaled[864]: dns: Set: {DefaultResolvers:[1.1.1.1] Routes:{my-domain.tld.:[]}+65arpa SearchDomains:[my-domain.tld.] Hosts:5} Dez 06 23:57:35 client1 tailscaled[864]: dns: Resolvercfg: {Routes:{.:[1.1.1.1]} Hosts:5 LocalDomains:[my-domain.tld.]+65arpa} Dez 06 23:57:35 client1 tailscaled[864]: dns: OScfg: {Hosts:[] Nameservers:[100.100.100.100] SearchDomains:[my-domain.tld.] MatchDomains:[]} - - - - 8< - - now it's broken - - - - - - - ``` When looking at my headscale logs, i couldn't find any suspicious log messages. The only log lines during the time in question were: ``` Dez 06 23:56:19 server1 headscale-start[43971]: 2022-12-06T23:56:19+01:00 INF Client sent endpoint update and is ok with a response without peer list handler=PollNetMap machine=client1 noise=true Dez 06 23:56:42 server1 headscale-start[43971]: 2022-12-06T23:56:42+01:00 INF Client sent endpoint update and is ok with a response without peer list handler=PollNetMap machine=client1 noise=true Dez 06 23:57:30 server1 headscale-start[43971]: 2022-12-06T23:57:30+01:00 INF Client sent endpoint update and is ok with a response without peer list handler=PollNetMap machine=client1 noise=true ``` I'm not sure if this is an issue with headscale or the tailscale client, but i would like to rule out headscale before opening an issue with tailscale. Any help is greatly appreciated. Thank you very much!
adam added the bug label 2025-12-29 01:28:10 +01:00
adam closed this issue 2025-12-29 01:28:10 +01:00
Author
Owner

@thorstenweber83 commented on GitHub (Dec 12, 2022):

I found the cause of my problems:

My user account/namespace was named "my-domain.tld" which would have been nice with magic dns.

After choosing another name for my user account, the problems with DNS resolution went away.

@thorstenweber83 commented on GitHub (Dec 12, 2022): I found the cause of my problems: My user account/namespace was named "my-domain.tld" which would have been nice with magic dns. After choosing another name for my user account, the problems with DNS resolution went away.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#388