Unable to connect to headscale #610

Closed
opened 2025-12-29 02:21:10 +01:00 by adam · 8 comments
Owner

Originally created by @anton-livewyer on GitHub (Jan 11, 2024).

Bug description

Hi,

First of all want to say "Thank you" for such a cool product you build!

We have faced some issue with headscale server multiple times. The issue is that when tailscale client tries to connect to the headscale it gets could not register machine error returned in the browser (we use OIDC with Google provider). What happens on the server level at that time is:

  1. the error ERR Failed to persist/update machine in the database error="database is locked (5) (SQLITE_BUSY)" handler=PollNetMap machine=<NODE_NAME> appears in the log
  2. then the ERR Failed to persist/update machine in the database error="SQL logic error: cannot start a transaction within a transaction (1)" handler=PollNetMap machine=<NODE_NAME> error message just spamming the server log with different machine names in <NODE_NAME> field

I find it hard to say what exactly causing this but can definitely say that two times it happened after our two separate users updated their local macos tailscale clients to the latest version and were unable to connect to server after that. Also after this issue appears all the users who do a reconnect to the server get the same error. So from my understanding headscale tries to write some data to database when user connects but it's unable to do that because of the locked database.

According to the log message header the issue is linked to this function

Environment

  • OS: Ubuntu 20.04
  • Headscale version: 0.20.0
  • Tailscale version: definitely occurred on MacOS 1.38.1
  • Database: .sqlite file stored on the server
  • Headscale is behind a (reverse) proxy
  • Headscale runs in a container

To Reproduce

We were trying to reproduce this by downloading the older version of tailscale client (both windows and linux, don't have ability to test on mac), connecting to server and then updating the client to the latest version proposed by tailscale as it is the only way we are aware of that can cause this issue but we didn't have any success reproducing it so from my understanding this is the bug

Originally created by @anton-livewyer on GitHub (Jan 11, 2024). <!-- Before posting a bug report, discuss the behaviour you are expecting with the Discord community to make sure that it is truly a bug. The issue tracker is not the place to ask for support or how to set up Headscale. Bug reports without the sufficient information will be closed. Headscale is a multinational community across the globe. Our language is English. All bug reports needs to be in English. --> ## Bug description <!-- A clear and concise description of what the bug is. Describe the expected bahavior and how it is currently different. If you are unsure if it is a bug, consider discussing it on our Discord server first. --> Hi, First of all want to say "Thank you" for such a cool product you build! We have faced some issue with headscale server multiple times. The issue is that when tailscale client tries to connect to the headscale it gets `could not register machine` error returned in the browser (we use OIDC with Google provider). What happens on the server level at that time is: 1. the error `ERR Failed to persist/update machine in the database error="database is locked (5) (SQLITE_BUSY)" handler=PollNetMap machine=<NODE_NAME>` appears in the log 2. then the `ERR Failed to persist/update machine in the database error="SQL logic error: cannot start a transaction within a transaction (1)" handler=PollNetMap machine=<NODE_NAME>` error message just spamming the server log with different machine names in `<NODE_NAME>` field I find it hard to say what exactly causing this but can definitely say that two times it happened after our two separate users updated their local macos tailscale clients to the latest version and were unable to connect to server after that. Also after this issue appears all the users who do a reconnect to the server get the same error. So from my understanding headscale tries to write some data to database when user connects but it's unable to do that because of the locked database. According to the log message header the issue is linked to [this function](https://github.com/juanfont/headscale/blob/main/hscontrol/poll.go#L57) ## Environment <!-- Please add relevant information about your system. For example: - Version of headscale used - Version of tailscale client - OS (e.g. Linux, Mac, Cygwin, WSL, etc.) and version - Kernel version - The relevant config parameters you used - Log output --> - OS: Ubuntu 20.04 - Headscale version: 0.20.0 - Tailscale version: definitely occurred on MacOS `1.38.1` - Database: `.sqlite` file stored on the server <!-- We do not support running Headscale in a container nor behind a (reverse) proxy. If either of these are true for your environment, ask the community in Discord instead of filing a bug report. --> - [ ] Headscale is behind a (reverse) proxy - [ ] Headscale runs in a container ## To Reproduce <!-- Steps to reproduce the behavior. --> We were trying to reproduce this by downloading the older version of tailscale client (both windows and linux, don't have ability to test on mac), connecting to server and then updating the client to the latest version proposed by tailscale as it is the only way we are aware of that can cause this issue but we didn't have any success reproducing it so from my understanding this is the bug
adam added the stalebug labels 2025-12-29 02:21:10 +01:00
adam closed this issue 2025-12-29 02:21:10 +01:00
Author
Owner

@TotoTheDragon commented on GitHub (Feb 11, 2024):

@anton-livewyer latest stable version is 0.22.3, is this reproducable in that version?

@TotoTheDragon commented on GitHub (Feb 11, 2024): @anton-livewyer latest stable version is 0.22.3, is this reproducable in that version?
Author
Owner

@sthomson-wyn commented on GitHub (Feb 13, 2024):

We are seeing this on 0.22.3. Not sure if it's a coincidence, but a lot of our users upgraded their tailscale clients from 1.56.x client to 1.58.x today

@sthomson-wyn commented on GitHub (Feb 13, 2024): We are seeing this on 0.22.3. Not sure if it's a coincidence, but a lot of our users upgraded their tailscale clients from 1.56.x client to 1.58.x today
Author
Owner

@sthomson-wyn commented on GitHub (Feb 13, 2024):

I'll also mention that this seems to occur after we restart our headscale deployment in kubernetes. I imagine that any brief overlap between pod uptimes may be the cause of db locking

@sthomson-wyn commented on GitHub (Feb 13, 2024): I'll also mention that this seems to occur after we restart our headscale deployment in kubernetes. I imagine that any brief overlap between pod uptimes may be the cause of db locking
Author
Owner

@TotoTheDragon commented on GitHub (Feb 13, 2024):

I'll also mention that this seems to occur after we restart our headscale deployment in kubernetes. I imagine that any brief overlap between pod uptimes may be the cause of db locking

Yes, makes a lot of sense. I do not assume this would be fixed in v0.22, but I will make a ticket to make sure database is properly closed on kill in v0.23.

For your current use case switching to postgres might be a viable solution to the locking problem.

@TotoTheDragon commented on GitHub (Feb 13, 2024): > I'll also mention that this seems to occur after we restart our headscale deployment in kubernetes. I imagine that any brief overlap between pod uptimes may be the cause of db locking Yes, makes a lot of sense. I do not assume this would be fixed in v0.22, but I will make a ticket to make sure database is properly closed on kill in v0.23. For your current use case switching to postgres might be a viable solution to the locking problem.
Author
Owner

@sthomson-wyn commented on GitHub (Feb 13, 2024):

We're currently switching to using a Statefulset instead of a Deployment (should've done that in the first place) to address the overlap.

Postgres is a good idea, we'll do that later too. Thanks @TotoTheDragon

@sthomson-wyn commented on GitHub (Feb 13, 2024): We're currently switching to using a Statefulset instead of a Deployment (should've done that in the first place) to address the overlap. Postgres is a good idea, we'll do that later too. Thanks @TotoTheDragon
Author
Owner

@TotoTheDragon commented on GitHub (Feb 13, 2024):

We're currently switching to using a Statefulset instead of a Deployment (should've done that in the first place) to address the overlap.

Postgres is a good idea, we'll do that later too. Thanks @TotoTheDragon

Alright, whrn you have tested a new environment please let us know if anything has changed

@TotoTheDragon commented on GitHub (Feb 13, 2024): > We're currently switching to using a Statefulset instead of a Deployment (should've done that in the first place) to address the overlap. > > Postgres is a good idea, we'll do that later too. Thanks @TotoTheDragon Alright, whrn you have tested a new environment please let us know if anything has changed
Author
Owner

@github-actions[bot] commented on GitHub (May 14, 2024):

This issue is stale because it has been open for 90 days with no activity.

@github-actions[bot] commented on GitHub (May 14, 2024): This issue is stale because it has been open for 90 days with no activity.
Author
Owner

@github-actions[bot] commented on GitHub (May 21, 2024):

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions[bot] commented on GitHub (May 21, 2024): This issue was closed because it has been inactive for 14 days since being marked as stale.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#610