[Bug] Upgrade from 0.26.1 --> 0.27.0 makes all clients unable to connect to headscale #1141

Closed
opened 2025-12-29 02:28:31 +01:00 by adam · 5 comments
Owner

Originally created by @phxyz12 on GitHub (Nov 8, 2025).

Is this a support request?

  • This is not a support request

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Dear all

I tried to upgrade Tailscale from version 0.26.1 to 0.27.0. I am using Docker containers and Docker Compose. I followed the upgrade steps found here: https://headscale.net/stable/setup/upgrade/. I compared the config files and merged the few changes in v0.27.0 into the existing config file. I took a backup of the 0.26.1 instance.

After starting v0.27.0, all clients were not able anymore to connect to headscale. In the logs, I saw lots of errors like this:

ERR noise upgrade failed error="noise handshake failed: decrypting machine key: chacha20poly1305: message authentication failed"

I didn't find a way to solve this other than deleting a client and registering it anew.

Expected Behavior

I would have wished, that clients stay ok, but I don't know if this was intended. Maybe migrating registered clients between 0.26 and 0.27 is not supported?

Steps To Reproduce

  1. Install v0.26.1 using a Docker container and docker compose
  2. Front Headscale with Traefik reverse proxy
  3. Carry out upgrade procedure to 0.27.0 as described here: https://headscale.net/stable/setup/upgrade/
  4. Start headscale
  5. Try to connect with an existing (= already registered) client

Environment

- OS: Ubuntu 24.04
- Headscale version: 0.26.1 / 0.27.0
- Tailscale version: 1.90.6 and also 1.88.x
- Docker and Docker Compose (current versions for ubuntu)
- Traefik reverse proxy v3.5

Runtime environment

  • Headscale is behind a (reverse) proxy
  • Headscale runs in a container

Debug information

Headscale logs contain many lines like this:

ERR noise upgrade failed error="noise handshake failed: decrypting machine key: chacha20poly1305: message authentication failed"

Traefik labels (in compose file):

labels:
      - traefik.enable=true
      - traefik.http.services.headscale.loadbalancer.server.port=8080
      - traefik.http.services.headscale.loadbalancer.server.scheme=http
      - traefik.http.routers.headscale-https.entrypoints=websecure
      - traefik.http.routers.headscale-https.rule=Host(`headscale.domain.org`)
      - traefik.http.routers.headscale-https.tls=true
      - traefik.http.routers.headscale-https.tls.certresolver=lencrypt
      - traefik.http.routers.headscale-https.middlewares=my-traefik-plugin-geoblock@file  # geo block

Note: Headscale v0.26.1 worked without issues using the same Traefik config.

Originally created by @phxyz12 on GitHub (Nov 8, 2025). ### Is this a support request? - [x] This is not a support request ### Is there an existing issue for this? - [x] I have searched the existing issues ### Current Behavior Dear all I tried to upgrade Tailscale from version 0.26.1 to 0.27.0. I am using Docker containers and Docker Compose. I followed the upgrade steps found here: https://headscale.net/stable/setup/upgrade/. I compared the config files and merged the few changes in v0.27.0 into the existing config file. I took a backup of the 0.26.1 instance. After starting v0.27.0, all clients were not able anymore to connect to headscale. In the logs, I saw lots of errors like this: ```bash ERR noise upgrade failed error="noise handshake failed: decrypting machine key: chacha20poly1305: message authentication failed" ``` I didn't find a way to solve this other than deleting a client and registering it anew. ### Expected Behavior I would have wished, that clients stay ok, but I don't know if this was intended. Maybe migrating registered clients between 0.26 and 0.27 is not supported? ### Steps To Reproduce 1. Install v0.26.1 using a Docker container and docker compose 2. Front Headscale with Traefik reverse proxy 3. Carry out upgrade procedure to 0.27.0 as described here: https://headscale.net/stable/setup/upgrade/ 4. Start headscale 5. Try to connect with an existing (= already registered) client ### Environment ```markdown - OS: Ubuntu 24.04 - Headscale version: 0.26.1 / 0.27.0 - Tailscale version: 1.90.6 and also 1.88.x - Docker and Docker Compose (current versions for ubuntu) - Traefik reverse proxy v3.5 ``` ### Runtime environment - [x] Headscale is behind a (reverse) proxy - [x] Headscale runs in a container ### Debug information Headscale logs contain many lines like this: ```bash ERR noise upgrade failed error="noise handshake failed: decrypting machine key: chacha20poly1305: message authentication failed" ``` Traefik labels (in compose file): ```yaml labels: - traefik.enable=true - traefik.http.services.headscale.loadbalancer.server.port=8080 - traefik.http.services.headscale.loadbalancer.server.scheme=http - traefik.http.routers.headscale-https.entrypoints=websecure - traefik.http.routers.headscale-https.rule=Host(`headscale.domain.org`) - traefik.http.routers.headscale-https.tls=true - traefik.http.routers.headscale-https.tls.certresolver=lencrypt - traefik.http.routers.headscale-https.middlewares=my-traefik-plugin-geoblock@file # geo block ``` _Note: Headscale v0.26.1 worked without issues using the same Traefik config._
adam added the bug label 2025-12-29 02:28:31 +01:00
adam closed this issue 2025-12-29 02:28:31 +01:00
Author
Owner

@nblock commented on GitHub (Nov 8, 2025):

I would have wished, that clients stay ok, but I don't know if this was intended. Maybe migrating registered clients between 0.26 and 0.27 is not supported?

That's sad to hear, upgrading from 0.26 to 0.27 should keep all clients connected unless they are too old. But that's likely not the problem here.

Could you please share your docker compose file?

@nblock commented on GitHub (Nov 8, 2025): > I would have wished, that clients stay ok, but I don't know if this was intended. Maybe migrating registered clients between 0.26 and 0.27 is not supported? That's sad to hear, upgrading from 0.26 to 0.27 should keep all clients connected unless they are too old. But that's likely not the problem here. Could you please share your docker compose file?
Author
Owner

@phxyz12 commented on GitHub (Nov 8, 2025):

Hi,

here we go:

Docker compose file for v 0.26.1:

services:
  headscale:
    container_name: headscale
    image: headscale/headscale:0.26.1
    volumes:
      - ./config:/etc/headscale/  # Configuration files
      - ./data:/var/lib/headscale # Data persistence
    # ports:
    #  - "8282:8080" 
    cap_add:
      - NET_ADMIN
      - SYS_MODULE
    sysctls:
      - net.ipv4.ip_forward=1
      - net.ipv4.conf.all.src_valid_mark=1
    command: serve
    restart: unless-stopped
    labels:
      - traefik.enable=true
      - traefik.http.services.headscale.loadbalancer.server.port=8080
      - traefik.http.services.headscale.loadbalancer.server.scheme=http
      - traefik.http.routers.headscale-https.entrypoints=websecure
      - traefik.http.routers.headscale-https.rule=Host(`headscale.somedomain.org`)
      - traefik.http.routers.headscale-https.tls=true
      - traefik.http.routers.headscale-https.tls.certresolver=ionos
      - traefik.http.routers.headscale-https.middlewares=my-traefik-plugin-geoblock@file  # geo block
    networks:
      - frontend

networks:
  frontend:
    external: true

Docker compose file for v0.27.0:

services:
  headscale:
    container_name: headscale
    image: headscale/headscale:0.27.0
    volumes:
      - ./config:/etc/headscale/  # Configuration files
      - ./lib:/var/lib/headscale     # Data persistence
      - ./run:/var/run/headscale
    cap_add:
      - NET_ADMIN
      - SYS_MODULE
    sysctls:
      - net.ipv4.ip_forward=1
      - net.ipv4.conf.all.src_valid_mark=1
    command: serve
    healthcheck:
        test: ["CMD", "headscale", "health"]
    restart: unless-stopped
    labels:
      - traefik.enable=true
      - traefik.http.services.headscale.loadbalancer.server.port=8080
      - traefik.http.services.headscale.loadbalancer.server.scheme=http
      - traefik.http.routers.headscale-https.entrypoints=websecure
      - traefik.http.routers.headscale-https.rule=Host(`headscale.somedomain.org`)
      - traefik.http.routers.headscale-https.tls=true
      - traefik.http.routers.headscale-https.tls.certresolver=ionos
      - traefik.http.routers.headscale-https.middlewares=my-traefik-plugin-geoblock@file  # geo block
    networks:
      - frontend

networks:
  frontend:
    external: true

As you can see, there is not much difference. I added mounting of /var/run and a healthcheck section since I found these elements in the latest Docker compose example file.

A side note:

I use headscale (in both versions) with SQLite db as backend for the ACL configs. I changed to SQLite since I wanted to use the headplane (https://headplane.net/) frontend which only supports management of ACLs when SQLite is used.

@phxyz12 commented on GitHub (Nov 8, 2025): Hi, here we go: Docker compose file for v 0.26.1: ```yaml services: headscale: container_name: headscale image: headscale/headscale:0.26.1 volumes: - ./config:/etc/headscale/ # Configuration files - ./data:/var/lib/headscale # Data persistence # ports: # - "8282:8080" cap_add: - NET_ADMIN - SYS_MODULE sysctls: - net.ipv4.ip_forward=1 - net.ipv4.conf.all.src_valid_mark=1 command: serve restart: unless-stopped labels: - traefik.enable=true - traefik.http.services.headscale.loadbalancer.server.port=8080 - traefik.http.services.headscale.loadbalancer.server.scheme=http - traefik.http.routers.headscale-https.entrypoints=websecure - traefik.http.routers.headscale-https.rule=Host(`headscale.somedomain.org`) - traefik.http.routers.headscale-https.tls=true - traefik.http.routers.headscale-https.tls.certresolver=ionos - traefik.http.routers.headscale-https.middlewares=my-traefik-plugin-geoblock@file # geo block networks: - frontend networks: frontend: external: true ``` Docker compose file for v0.27.0: ```yaml services: headscale: container_name: headscale image: headscale/headscale:0.27.0 volumes: - ./config:/etc/headscale/ # Configuration files - ./lib:/var/lib/headscale # Data persistence - ./run:/var/run/headscale cap_add: - NET_ADMIN - SYS_MODULE sysctls: - net.ipv4.ip_forward=1 - net.ipv4.conf.all.src_valid_mark=1 command: serve healthcheck: test: ["CMD", "headscale", "health"] restart: unless-stopped labels: - traefik.enable=true - traefik.http.services.headscale.loadbalancer.server.port=8080 - traefik.http.services.headscale.loadbalancer.server.scheme=http - traefik.http.routers.headscale-https.entrypoints=websecure - traefik.http.routers.headscale-https.rule=Host(`headscale.somedomain.org`) - traefik.http.routers.headscale-https.tls=true - traefik.http.routers.headscale-https.tls.certresolver=ionos - traefik.http.routers.headscale-https.middlewares=my-traefik-plugin-geoblock@file # geo block networks: - frontend networks: frontend: external: true ``` As you can see, there is not much difference. I added mounting of `/var/run` and a healthcheck section since I found these elements in the latest Docker compose example file. A side note: I use headscale (in both versions) with SQLite db as backend for the ACL configs. I changed to SQLite since I wanted to use the headplane (https://headplane.net/) frontend which only supports management of ACLs when SQLite is used.
Author
Owner

@nblock commented on GitHub (Nov 9, 2025):

The error noise upgrade failed: noise handshake failed: decrypting machine key: chacha20poly1305: message authentication failed occurs when the noise_private.key is changed.

The compose file for 0.26.1 contains:

    volumes:
      - ./config:/etc/headscale/  # Configuration files
      - ./data:/var/lib/headscale # Data persistence

while the compose file for 0.27.0 contains:

    volumes:
      - ./config:/etc/headscale/  # Configuration files
      - ./lib:/var/lib/headscale     # Data persistence
      - ./run:/var/run/headscale

Is it possible that you forgot to rename the local directory data to lib before starting 0.27? It looks like
Headscale did not find the noise_private.key at some point and created it anew causing the error.

@nblock commented on GitHub (Nov 9, 2025): The error `noise upgrade failed: noise handshake failed: decrypting machine key: chacha20poly1305: message authentication failed` occurs when the `noise_private.key` is changed. The compose file for 0.26.1 contains: ```yaml volumes: - ./config:/etc/headscale/ # Configuration files - ./data:/var/lib/headscale # Data persistence ``` while the compose file for 0.27.0 contains: ```yaml volumes: - ./config:/etc/headscale/ # Configuration files - ./lib:/var/lib/headscale # Data persistence - ./run:/var/run/headscale ``` Is it possible that you forgot to rename the local directory `data` to `lib` before starting 0.27? It looks like Headscale did not find the `noise_private.key` at some point and created it anew causing the error.
Author
Owner

@phxyz12 commented on GitHub (Nov 9, 2025):

I checked directories, I did rename datato lib:

drwxrwxr-x 2 user user 4096 Nov  4 15:44 config/
-rw-rw-r-- 1 user user 2407 Nov  4 15:49 docker-compose.yml
drwxrwxr-x 2 user user 4096 Oct  7 13:21 headplane-config/
drwxrwxr-x 2 user user 4096 Sep 12 11:05 headplane-data/
drwxrwxr-x 2 user user 4096 Nov  4 15:50 lib/
drwxrwxr-x 2 user user 4096 Nov  4 15:50 run/

Inside lib, I have:

-rw-r--r-- 1 root   root   290816 Nov  4 15:50 db.sqlite
-rw------- 1 root   root       72 Sep 12 10:46 noise_private.key

I see one difference regarding the file db.sqlite regarding it's ownership:

v0.26.1:   -rw-r--r-- 1 user user  290816 Nov  8 21:45 db.sqlite
v0.27.0:   -rw-r--r-- 1 root   root   290816 Nov  4 15:50 db.sqlite

Ownership/permissions of the file noise_private.key are the same in both installations.

@phxyz12 commented on GitHub (Nov 9, 2025): I checked directories, I did rename `data`to `lib`: ```bash drwxrwxr-x 2 user user 4096 Nov 4 15:44 config/ -rw-rw-r-- 1 user user 2407 Nov 4 15:49 docker-compose.yml drwxrwxr-x 2 user user 4096 Oct 7 13:21 headplane-config/ drwxrwxr-x 2 user user 4096 Sep 12 11:05 headplane-data/ drwxrwxr-x 2 user user 4096 Nov 4 15:50 lib/ drwxrwxr-x 2 user user 4096 Nov 4 15:50 run/ ``` Inside `lib`, I have: ```bash -rw-r--r-- 1 root root 290816 Nov 4 15:50 db.sqlite -rw------- 1 root root 72 Sep 12 10:46 noise_private.key ``` I see one difference regarding the file `db.sqlite` regarding it's ownership: ```bash v0.26.1: -rw-r--r-- 1 user user 290816 Nov 8 21:45 db.sqlite v0.27.0: -rw-r--r-- 1 root root 290816 Nov 4 15:50 db.sqlite ``` Ownership/permissions of the file `noise_private.key` are the same in both installations.
Author
Owner

@nblock commented on GitHub (Nov 12, 2025):

0.27.1 is now available and the update should preserve node connectivity, please give it a try.

@nblock commented on GitHub (Nov 12, 2025): 0.27.1 is now available and the update should preserve node connectivity, please give it a try.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#1141