[Bug] Headscale attempts to use non-existent TLS certs when using OIDC behind a reverse proxy #890

Closed
opened 2025-12-29 02:25:29 +01:00 by adam · 10 comments
Owner

Originally created by @vguttmann on GitHub (Dec 18, 2024).

Is this a support request?

  • This is not a support request

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Context: in our deployment, a central proxy handles everything HTTPS, so the proxy talks HTTP to Headscale, however, the server URL is still https, since all incoming requests will be HTTPS.

When marking OIDC as mandatory, headscale attempts to read a non-existent certificate in /etc/let's encrypt/live/headscale/cert.pem
That certificate did exist once, however, since we're handling HTTPS at the proxy level now, I deleted them from certbot.

Expected Behavior

Headscale works fine with OIDC required for startup enabled

Steps To Reproduce

  1. Using a reverse proxy that handles HTTPS, and a HTTP server URL, with empty TLS certs in the config file
  2. Marking OIDC as mandatory in the config file
  3. Headscale crashes on startup

Environment

- OS: Ubuntu 20.04
- Headscale version: 0.23.0
- Tailscale version: [not applicable]

Runtime environment

  • Headscale is behind a (reverse) proxy
  • Headscale runs in a container

Anything else?

No response

Originally created by @vguttmann on GitHub (Dec 18, 2024). ### Is this a support request? - [X] This is not a support request ### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior Context: in our deployment, a central proxy handles everything HTTPS, so the proxy talks HTTP to Headscale, however, the server URL is still https, since all incoming requests will be HTTPS. When marking OIDC as mandatory, headscale attempts to read a non-existent certificate in /etc/let's encrypt/live/headscale/cert.pem That certificate did exist once, however, since we're handling HTTPS at the proxy level now, I deleted them from certbot. ### Expected Behavior Headscale works fine with OIDC required for startup enabled ### Steps To Reproduce 1. Using a reverse proxy that handles HTTPS, and a HTTP server URL, with empty TLS certs in the config file 2. Marking OIDC as mandatory in the config file 3. Headscale crashes on startup ### Environment ```markdown - OS: Ubuntu 20.04 - Headscale version: 0.23.0 - Tailscale version: [not applicable] ``` ### Runtime environment - [X] Headscale is behind a (reverse) proxy - [ ] Headscale runs in a container ### Anything else? _No response_
adam added the bug label 2025-12-29 02:25:29 +01:00
adam closed this issue 2025-12-29 02:25:29 +01:00
Author
Owner

@kradalby commented on GitHub (Dec 18, 2024):

0.22.3 is not supported anymore, can you test and see if it still apply with 0.23.0?

@kradalby commented on GitHub (Dec 18, 2024): 0.22.3 is not supported anymore, can you test and see if it still apply with 0.23.0?
Author
Owner

@vguttmann commented on GitHub (Dec 18, 2024):

Apologies, that issue was written somewhat from memory, I thought I had installed 0.22.3 because that's what I remember from my setup at home.

I actually set up 0.23.0, and never anything else in this case.

@vguttmann commented on GitHub (Dec 18, 2024): Apologies, that issue was written somewhat from memory, I *thought* I had installed 0.22.3 because that's what I remember from my setup at home. I actually set up 0.23.0, and never anything else in this case.
Author
Owner

@vguttmann commented on GitHub (Dec 18, 2024):

I've attempted using http instead of https in the server URL as well, but the issue persists. Here's my (somewhat sanitized) config:

server_url: "https://cluster.<root domain>:443"
listen_addr: 0.0.0.0:443
metrics_listen_addr: 0.0.0.0:9090
grpc_listen_addr: 127.0.0.1:50443
grpc_allow_insecure: false
noise:
  private_key_path: /var/lib/headscale/noise_private.key
prefixes:
  v6: fd7a:115c:a1e0::/48
  v4: 100.64.0.0/10

  # Strategy used for allocation of IPs to nodes, available options:
  # - sequential (default): assigns the next free IP from the previous given IP.
  # - random: assigns the next free IP from a pseudo-random IP generator (crypto/rand).
  allocation: sequential

# DERP is a relay system that Tailscale uses when a direct
# connection cannot be established.
# https://tailscale.com/blog/how-tailscale-works/#encrypted-tcp-relays-derp
#
# headscale needs a list of DERP servers that can be presented
# to the clients.
derp:
  server:
    enabled: true
    region_id: 999
    region_code: "headscale"
    region_name: "Headscale Embedded DERP"
    stun_listen_addr: "0.0.0.0:3478"
   private_key_path: /var/lib/headscale/derp_server_private.key
    ipv4: 1.2.3.4
    ipv6: 2001:db8::1
  urls:
    - https://controlplane.tailscale.com/derpmap/default
  paths: []
  auto_update_enabled: true
  update_frequency: 24h
disable_check_updates: false
ephemeral_node_inactivity_timeout: 30m

database:
  type: sqlite
  debug: false
  gorm:
    prepare_stmt: true
    parameterized_queries: true
    skip_err_record_not_found: true
    slow_threshold: 1000
  sqlite:
    path: /var/lib/headscale/db.sqlite
    write_ahead_log: true

acme_url: https://acme-v02.api.letsencrypt.org/directory
acme_email: ""
tls_letsencrypt_hostname: ""
tls_letsencrypt_cache_dir: /var/lib/headscale/cache
tls_letsencrypt_challenge_type: HTTP-01
tls_letsencrypt_listen: ":http"
tls_cert_path: ""
tls_key_path: ""

log:
  format: text
  level: debug

policy:
  mode: file
  path: ""

dns:
  magic_dns: true
  base_domain: example.com
  nameservers:
    global:
      - 1.1.1.1
      - 1.0.0.1
      - 2606:4700:4700::1111
      - 2606:4700:470
      - 0::1001
    split:
      {}
  search_domains: []
  extra_records: []
  use_username_in_magic_dns: false

unix_socket: /var/run/headscale/headscale.sock
unix_socket_permission: "0770"

oidc:
  only_start_if_oidc_is_available: true
  issuer: "https://intranet.<root domain>/fast-sso/auth/"
  client_id: "headscale"
  client_secret: "<client secret>"
  scope: ["openid", "profile", "email", "roles", "web-origins"]

logtail:
  enabled: false
randomize_client_port: false

Our OIDC provider is a locally hosted instance of Keycloak, running behind exactly the same "the proxy handles TLS" style reverse proxy, in case that matters.

@vguttmann commented on GitHub (Dec 18, 2024): I've attempted using http instead of https in the server URL as well, but the issue persists. Here's my (somewhat sanitized) config: ```yaml server_url: "https://cluster.<root domain>:443" listen_addr: 0.0.0.0:443 metrics_listen_addr: 0.0.0.0:9090 grpc_listen_addr: 127.0.0.1:50443 grpc_allow_insecure: false noise: private_key_path: /var/lib/headscale/noise_private.key prefixes: v6: fd7a:115c:a1e0::/48 v4: 100.64.0.0/10 # Strategy used for allocation of IPs to nodes, available options: # - sequential (default): assigns the next free IP from the previous given IP. # - random: assigns the next free IP from a pseudo-random IP generator (crypto/rand). allocation: sequential # DERP is a relay system that Tailscale uses when a direct # connection cannot be established. # https://tailscale.com/blog/how-tailscale-works/#encrypted-tcp-relays-derp # # headscale needs a list of DERP servers that can be presented # to the clients. derp: server: enabled: true region_id: 999 region_code: "headscale" region_name: "Headscale Embedded DERP" stun_listen_addr: "0.0.0.0:3478" private_key_path: /var/lib/headscale/derp_server_private.key ipv4: 1.2.3.4 ipv6: 2001:db8::1 urls: - https://controlplane.tailscale.com/derpmap/default paths: [] auto_update_enabled: true update_frequency: 24h disable_check_updates: false ephemeral_node_inactivity_timeout: 30m database: type: sqlite debug: false gorm: prepare_stmt: true parameterized_queries: true skip_err_record_not_found: true slow_threshold: 1000 sqlite: path: /var/lib/headscale/db.sqlite write_ahead_log: true acme_url: https://acme-v02.api.letsencrypt.org/directory acme_email: "" tls_letsencrypt_hostname: "" tls_letsencrypt_cache_dir: /var/lib/headscale/cache tls_letsencrypt_challenge_type: HTTP-01 tls_letsencrypt_listen: ":http" tls_cert_path: "" tls_key_path: "" log: format: text level: debug policy: mode: file path: "" dns: magic_dns: true base_domain: example.com nameservers: global: - 1.1.1.1 - 1.0.0.1 - 2606:4700:4700::1111 - 2606:4700:470 - 0::1001 split: {} search_domains: [] extra_records: [] use_username_in_magic_dns: false unix_socket: /var/run/headscale/headscale.sock unix_socket_permission: "0770" oidc: only_start_if_oidc_is_available: true issuer: "https://intranet.<root domain>/fast-sso/auth/" client_id: "headscale" client_secret: "<client secret>" scope: ["openid", "profile", "email", "roles", "web-origins"] logtail: enabled: false randomize_client_port: false ``` Our OIDC provider is a locally hosted instance of Keycloak, running behind exactly the same "the proxy handles TLS" style reverse proxy, in case that matters.
Author
Owner

@mhahl commented on GitHub (Dec 18, 2024):

Hey @vguttmann I am running a similar setup as you, I cant seem to replicate this, however I have never used letsencrypt in the past.

Could try backing up and cleaning out tls_letsencrypt_cache_dir ?

@mhahl commented on GitHub (Dec 18, 2024): Hey @vguttmann I am running a similar setup as you, I cant seem to replicate this, however I have never used letsencrypt in the past. Could try backing up and cleaning out `tls_letsencrypt_cache_dir` ?
Author
Owner

@vguttmann commented on GitHub (Dec 18, 2024):

That directory didn't even exist for me, maybe because I did not use Certbot as it's usually used - we just use certbot to request certificates for a subdomain of our choice from Sectigo without any actual verification (since they're just subdomains of our actual domain), using preshared keys, which obviously isn't really covered by the config file. Those requests, obviously, were done just as CLI commands

I just left those entries unchanged (except for cert and private key path), as they were supplied, and pointed headscale to /etc/letsencrypt/live/headscale/fullchain.pem and privkey.pem. Why it is now attempting to read cert.pem, even after I have used certbot to delete the certificate, and removed any mentions of them from the config file, is a mystery to me.

If you can work with Proxmox VM dumps, I could send one over to you after sanitizing the image a bit, obviously I do not want to leak our CA keys

@vguttmann commented on GitHub (Dec 18, 2024): That directory didn't even exist for me, maybe because I did not use Certbot as it's usually used - we just use certbot to request certificates for a subdomain of our choice from Sectigo without any actual verification (since they're just subdomains of our actual domain), using preshared keys, which obviously isn't really covered by the config file. Those requests, obviously, were done just as CLI commands I just left those entries unchanged (except for cert and private key path), as they were supplied, and pointed headscale to /etc/letsencrypt/live/headscale/fullchain.pem and privkey.pem. Why it is now attempting to read cert.pem, even after I have used certbot to delete the certificate, and removed any mentions of them from the config file, is a mystery to me. If you can work with Proxmox VM dumps, I could send one over to you after sanitizing the image a bit, obviously I do *not* want to leak our CA keys
Author
Owner

@mhahl commented on GitHub (Dec 18, 2024):

If you could please attach the server logs when its starting up that might be helpful for anyone trying to help troubleshoot.

@mhahl commented on GitHub (Dec 18, 2024): If you could please attach the server logs when its starting up that might be helpful for anyone trying to help troubleshoot.
Author
Owner

@vguttmann commented on GitHub (Dec 19, 2024):

Oh, right. Here's the output of journalctl -u headscale for a single start attempt

Dec 13 19:38:14 headscale systemd[1]: headscale.service: Scheduled restart job, restart counter is at 1.
Dec 13 19:38:14 headscale systemd[1]: Started headscale.service - headscale coordination server for Tailscale.
Dec 13 19:38:14 headscale headscale[2207]: 2024-12-13T19:38:14Z WRN
Dec 13 19:38:14 headscale headscale[2207]: WARN: The "dns.use_username_in_magic_dns" configuration key is deprecated and has been removed. Please see the changelog for more details.
Dec 13 19:38:14 headscale headscale[2207]: 2024-12-13T19:38:14Z INF Opening database database=sqlite3 path=/var/lib/headscale/db.sqlite
Dec 13 19:38:14 headscale headscale[2207]: 2024-12-13T19:38:14Z INF ../../../home/runner/work/headscale/headscale/hscontrol/derp/server/derp_server.go:103 > DERP region: {RegionID:99>
Dec 13 19:38:14 headscale headscale[2207]: 2024-12-13T19:38:14Z INF ../../../home/runner/work/headscale/headscale/hscontrol/derp/server/derp_server.go:104 > DERP Nodes[0]: &{Name:999>
Dec 13 19:38:14 headscale headscale[2207]: 2024-12-13T19:38:14Z INF STUN server started at [::]:3478
Dec 13 19:38:14 headscale headscale[2207]: 2024-12-13T19:38:14Z INF Setting up a DERPMap update worker frequency=86400000
Dec 13 19:38:14 headscale headscale[2217]: 2024-12-13T19:38:19Z FTL ../../../home/runner/work/headscale/headscale/cmd/headscale/cli/serve.go:29 > Headscale ran into an error and had to shut down. error="configuring TLS settings: open /etc/letsencrypt/live/headscale/cert.pem: permission denied"
Dec 13 19:38:14 headscale systemd[1]: headscale.service: Main process exited, code=exited, status=1/FAILURE
@vguttmann commented on GitHub (Dec 19, 2024): Oh, right. Here's the output of `journalctl -u headscale` for a single start attempt ```yaml Dec 13 19:38:14 headscale systemd[1]: headscale.service: Scheduled restart job, restart counter is at 1. Dec 13 19:38:14 headscale systemd[1]: Started headscale.service - headscale coordination server for Tailscale. Dec 13 19:38:14 headscale headscale[2207]: 2024-12-13T19:38:14Z WRN Dec 13 19:38:14 headscale headscale[2207]: WARN: The "dns.use_username_in_magic_dns" configuration key is deprecated and has been removed. Please see the changelog for more details. Dec 13 19:38:14 headscale headscale[2207]: 2024-12-13T19:38:14Z INF Opening database database=sqlite3 path=/var/lib/headscale/db.sqlite Dec 13 19:38:14 headscale headscale[2207]: 2024-12-13T19:38:14Z INF ../../../home/runner/work/headscale/headscale/hscontrol/derp/server/derp_server.go:103 > DERP region: {RegionID:99> Dec 13 19:38:14 headscale headscale[2207]: 2024-12-13T19:38:14Z INF ../../../home/runner/work/headscale/headscale/hscontrol/derp/server/derp_server.go:104 > DERP Nodes[0]: &{Name:999> Dec 13 19:38:14 headscale headscale[2207]: 2024-12-13T19:38:14Z INF STUN server started at [::]:3478 Dec 13 19:38:14 headscale headscale[2207]: 2024-12-13T19:38:14Z INF Setting up a DERPMap update worker frequency=86400000 Dec 13 19:38:14 headscale headscale[2217]: 2024-12-13T19:38:19Z FTL ../../../home/runner/work/headscale/headscale/cmd/headscale/cli/serve.go:29 > Headscale ran into an error and had to shut down. error="configuring TLS settings: open /etc/letsencrypt/live/headscale/cert.pem: permission denied" Dec 13 19:38:14 headscale systemd[1]: headscale.service: Main process exited, code=exited, status=1/FAILURE ```
Author
Owner

@vguttmann commented on GitHub (Dec 19, 2024):

However, I'll also try setting up a new VM without ever touching certs, as your setup seems to work. I will keep the old, and as I said, if you want that machine for analysis, you can poke around in it

@vguttmann commented on GitHub (Dec 19, 2024): However, I'll also try setting up a new VM without ever touching certs, as your setup seems to work. I will keep the old, and as I said, if you want that machine for analysis, you can poke around in it
Author
Owner

@vguttmann commented on GitHub (Dec 19, 2024):

Oh my god. I'm so stupid. I just assumed that journalctl gives me the most recent stuff first. THE DATE WAS RIGHT THERE. THE TIMESTAMPS WERE THERE.

This was complete stupidity on my part.

@vguttmann commented on GitHub (Dec 19, 2024): Oh my god. I'm so stupid. I just assumed that journalctl gives me the most recent stuff first. THE DATE WAS RIGHT THERE. THE TIMESTAMPS WERE THERE. This was complete stupidity on my part.
Author
Owner

@kradalby commented on GitHub (Dec 19, 2024):

No worries, glad you figured it out!

@kradalby commented on GitHub (Dec 19, 2024): No worries, glad you figured it out!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#890