mirror of
https://github.com/juanfont/headscale.git
synced 2026-01-12 04:10:32 +01:00
[Bug] Latest Docker Release Breaks Foreign Key Constraint in Database #921
Closed
opened 2025-12-29 02:26:00 +01:00 by adam
·
41 comments
No Branch/Tag Specified
main
update_flake_lock_action
gh-pages
kradalby/release-v0.27.2
dependabot/go_modules/golang.org/x/crypto-0.45.0
dependabot/go_modules/github.com/opencontainers/runc-1.3.3
copilot/investigate-headscale-issue-2788
copilot/investigate-visibility-issue-2788
copilot/investigate-issue-2833
copilot/debug-issue-2846
copilot/fix-issue-2847
dependabot/go_modules/github.com/go-viper/mapstructure/v2-2.4.0
dependabot/go_modules/github.com/docker/docker-28.3.3incompatible
kradalby/cli-experiement3
doc/0.26.1
doc/0.25.1
doc/0.25.0
doc/0.24.3
doc/0.24.2
doc/0.24.1
doc/0.24.0
kradalby/build-docker-on-pr
topic/docu-versioning
topic/docker-kos
juanfont/fix-crash-node-id
juanfont/better-disclaimer
update-contributors
topic/prettier
revert-1893-add-test-stage-to-docs
add-test-stage-to-docs
remove-node-check-interval
fix-empty-prefix
fix-ephemeral-reusable
bug_report-debuginfo
autogroups
logs-to-stderr
revert-1414-topic/fix_unix_socket
rename-machine-node
port-embedded-derp-tests-v2
port-derp-tests
duplicate-word-linter
update-tailscale-1.36
warn-against-apache
ko-fi-link
more-acl-tests
fix-typo-standalone
parallel-nolint
tparallel-fix
rerouting
ssh-changelog-docs
oidc-cleanup
web-auth-flow-tests
kradalby-gh-runner
fix-proto-lint
remove-funding-links
go-1.19
enable-1.30-in-tests
0.16.x
cosmetic-changes-integration
tmp-fix-integration-docker
fix-integration-docker
configurable-update-interval
show-nodes-online
hs2021
acl-syntax-fixes
ts2021-implementation
fix-spurious-updates
unstable-integration-tests
mandatory-stun
embedded-derp
prtemplate-fix
v0.28.0-beta.1
v0.27.2-rc.1
v0.27.1
v0.27.0
v0.27.0-beta.2
v0.27.0-beta.1
v0.26.1
v0.26.0
v0.26.0-beta.2
v0.26.0-beta.1
v0.25.1
v0.25.0
v0.25.0-beta.2
v0.24.3
v0.25.0-beta.1
v0.24.2
v0.24.1
v0.24.0
v0.24.0-beta.2
v0.24.0-beta.1
v0.23.0
v0.23.0-rc.1
v0.23.0-beta.5
v0.23.0-beta.4
v0.23.0-beta3
v0.23.0-beta2
v0.23.0-beta1
v0.23.0-alpha12
v0.23.0-alpha11
v0.23.0-alpha10
v0.23.0-alpha9
v0.23.0-alpha8
v0.23.0-alpha7
v0.23.0-alpha6
v0.23.0-alpha5
v0.23.0-alpha4
v0.23.0-alpha4-docker-ko-test9
v0.23.0-alpha4-docker-ko-test8
v0.23.0-alpha4-docker-ko-test7
v0.23.0-alpha4-docker-ko-test6
v0.23.0-alpha4-docker-ko-test5
v0.23.0-alpha-docker-release-test-debug2
v0.23.0-alpha-docker-release-test-debug
v0.23.0-alpha4-docker-ko-test4
v0.23.0-alpha4-docker-ko-test3
v0.23.0-alpha4-docker-ko-test2
v0.23.0-alpha4-docker-ko-test
v0.23.0-alpha3
v0.23.0-alpha2
v0.23.0-alpha1
v0.22.3
v0.22.2
v0.23.0-alpha-docker-release-test
v0.22.1
v0.22.0
v0.22.0-alpha3
v0.22.0-alpha2
v0.22.0-alpha1
v0.22.0-nfpmtest
v0.21.0
v0.20.0
v0.19.0
v0.19.0-beta2
v0.19.0-beta1
v0.18.0
v0.18.0-beta4
v0.18.0-beta3
v0.18.0-beta2
v0.18.0-beta1
v0.17.1
v0.17.0
v0.17.0-beta5
v0.17.0-beta4
v0.17.0-beta3
v0.17.0-beta2
v0.17.0-beta1
v0.17.0-alpha4
v0.17.0-alpha3
v0.17.0-alpha2
v0.17.0-alpha1
v0.16.4
v0.16.3
v0.16.2
v0.16.1
v0.16.0
v0.16.0-beta7
v0.16.0-beta6
v0.16.0-beta5
v0.16.0-beta4
v0.16.0-beta3
v0.16.0-beta2
v0.16.0-beta1
v0.15.0
v0.15.0-beta6
v0.15.0-beta5
v0.15.0-beta4
v0.15.0-beta3
v0.15.0-beta2
v0.15.0-beta1
v0.14.0
v0.14.0-beta2
v0.14.0-beta1
v0.13.0
v0.13.0-beta3
v0.13.0-beta2
v0.13.0-beta1
upstream/v0.12.4
v0.12.4
v0.12.3
v0.12.2
v0.12.2-beta1
v0.12.1
v0.12.0-beta2
v0.12.0-beta1
v0.11.0
v0.10.8
v0.10.7
v0.10.6
v0.10.5
v0.10.4
v0.10.3
v0.10.2
v0.10.1
v0.10.0
v0.9.3
v0.9.2
v0.9.1
v0.9.0
v0.8.1
v0.8.0
v0.7.1
v0.7.0
v0.6.1
v0.6.0
v0.5.2
v0.5.1
v0.5.0
v0.4.0
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.2
v0.2.1
v0.2.0
v0.1.1
v0.1.0
Labels
Clear labels
CLI
DERP
DNS
Nix
OIDC
SSH
bug
database
documentation
duplicate
enhancement
faq
good first issue
grants
help wanted
might-come
needs design doc
needs investigation
no-stale-bot
out of scope
performance
policy 📝
pull-request
question
regression
routes
stale
tags
tailscale-feature-gap
well described ❤️
wontfix
Mirrored from GitHub Pull Request
No Label
bug
Milestone
No items
No Milestone
Projects
Clear projects
No project
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: starred/headscale#921
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @xcjs on GitHub (Jan 23, 2025).
Is this a support request?
Is there an existing issue for this?
Current Behavior
Upon Watchtower updating the Headscale container, the following log messages repeat until the container dies:
The container updates from the
latesttag, but it looks like several other container versions were pushed at the same time. I'm assuming (but haven't verified yet) that this correlates with 0.24.1 release, but definitely the release that occurred ~20 minutes ago.Expected Behavior
I would expect the database migration to succeed.
Steps To Reproduce
Updating the Docker container from a prior release to the release triggered 20 minutes ago on the latest tag should recreate this error.
Environment
Runtime environment
Anything else?
No response
@simonccc commented on GitHub (Jan 23, 2025):
I also had this issue ( in sqlite ) but running direct with no docker - I deleted the node using cmd line and it started up ok
@kradalby commented on GitHub (Jan 23, 2025):
Which release did you upgrade from? This is unfortunate, this part of the code seem to be a minefield as there has been years of no constraint, allowing a lot of stale broken nodes/routes/keys being written to the database.
The biggest challenge for me is figuring out what will blow up or needs to be removed before and what will not.
@xcjs commented on GitHub (Jan 23, 2025):
@kradalby I was/am, on the
:latestDocker tag, so I'm assuming whatever the prior major version was. Watchtower keeps my containers updated for me so I force myself to stay up to date on things. Based on other Docker tags out there, it was probably0.24.0or0.23.0?Whatever version was assigned to the
latesttag earlier today would be the version I was on.@xcjs commented on GitHub (Jan 23, 2025):
I can also say that pulling a backup and using any other container image updated today results in the same issue, so I was not able to go back to an earlier version unless it was really old.
@LucasJanin commented on GitHub (Jan 23, 2025):
The same issue, on my side, I try to upgrade my Headscale from 0.23 to 0.24.1
@kradalby commented on GitHub (Jan 23, 2025):
If anyone have a database they could strip of personal info that they can send to me, then that would be helpful for me to include in a test case.
My email is in my profile.
@kradalby commented on GitHub (Jan 23, 2025):
I have not found any of this issue on any of my "long running instances", so I suspect that there is an old node that has been around from before certain constraint was introduced and now they are violating.
@xcjs commented on GitHub (Jan 23, 2025):
@kradalby It depends on what "long running" means in this context, but my server has been up for 2-3 years.
@LucasJanin commented on GitHub (Jan 23, 2025):
Thanks @xcjs for your solution!
My Headscale v0.24.1 works perfectly!!!
@xcjs commented on GitHub (Jan 23, 2025):
I can probably do this with my backup copy of my database, but allow me a few days to look into scrubbing it.
@xcjs commented on GitHub (Jan 23, 2025):
Thanks for publishing this here! I was going to try and mention it in passing at least for other people facing the issue.
@yqs112358 commented on GitHub (Jan 24, 2025):
Same issue encounted. Thanks so much for your solution here 😊
@kradalby commented on GitHub (Jan 24, 2025):
For you who encountered this, was it typically one node that was bad? Or multiple?
@ghost commented on GitHub (Jan 24, 2025):
@kradalby multiple in my case (7 devices out of 12). Mostly devices that haven't authenticated the last months
@masterwishx commented on GitHub (Jan 24, 2025):
Same issue for me
@masterwishx commented on GitHub (Jan 24, 2025):
@kradalby commented on GitHub (Jan 24, 2025):
I'm just trying to understand what was the trigger for this, has the pre_auth_key been deleted for the nodes that violates?
@masterwishx commented on GitHub (Jan 24, 2025):
Seems issue coused by have =0 instead of NULL , and i think they all expired
@masterwishx commented on GitHub (Jan 24, 2025):
i had 3,8,9,10,11 id3= expired , id8-id11 = not exist at all (maybe deleted) have only 1-7 preauthkeys in list .
@masterwishx commented on GitHub (Jan 24, 2025):
Thanks , worked on old db befor update only
@kradalby commented on GitHub (Jan 24, 2025):
Interesting, as part of fixing up Tags, which has to be done and will be done in the not so far future, we need to always keep the pre_auth_key and it cannot be deleted as it contains information about the tags of a node.
Previously we did not appreciate this constraint, and allowed them to be deleted and this is now hunting us.
@super-ben commented on GitHub (Jan 25, 2025):
I can also confirm that the sqlite magic in the previous comment did work (in my case, there were 3 problematic rows). As I'm using the
latesttag in Docker and upgraded from 0.24.0 (so not a major version upgrade), this was somewhat unexpected, but glad there was a quick solution.@xlemassacre commented on GitHub (Jan 26, 2025):
Also had the same issue upgrading from 0.23.0 to 0.24.1.
The command above updating auth_key_id from '0' to NULL on 5 nodes solved it.
@NOP4 commented on GitHub (Jan 28, 2025):
This fix don't work for me. I still have the same error after setting auth_key_id to NULL for row 4:
I checked in the database, and it was 0, now it's NULL.
I see all devices registered with register_method "cli" have NULL now. Devices registered with authkey have a number. row_id 4 is the first entry with "register_method = cli" in the table.
Any idea?
@NOP4 commented on GitHub (Jan 28, 2025):
I've done a bit of digging:
Switching back to version 0.23 or 0.24 does not work as the database is now corrupted.
I reloaded some backup, and when I start with version 0.23, everything works fine.
The problem is clearly in the database conversion process between 0.23 and 0.24/latest.
The problem is... it corrupts database. Maybe, while the root cause and a fix is implemented, you should rollback the docker hub version before to many databases are corrupted and require restart from backup like I had to.
@xcjs commented on GitHub (Jan 28, 2025):
If you're still getting that error, it doesn't sound like the row was updated properly. Can you check again?
@xcjs commented on GitHub (Jan 28, 2025):
I ran into the same issue, but I think it's because all the container versions have the new migrations as they were all updated around the same time.
If a rollback occurs, this could break other environments that have already upgraded without issue or that had the work-around applied.
@NOP4 commented on GitHub (Jan 28, 2025):
Yes, you're right, can be dangerous rolling back...
Switching for 0.23 to 0.24 I have:
I stop the docker container, then edit the database:
Restart docker container, still in version 0.24 and I have the same error:
Content of table is:

I also tried to correct the table before the migration: Start from a working version 0.23. Stop container. Edit table. Start with version 0.24 => same error.
@panteparak commented on GitHub (Feb 1, 2025):
I am seeing this as well, Attempting to Upgrade from 0.22.3 -> 0.24.2.
What I've encountered was
auth_key_idis 0 and will cause foreign key issueWhat i've notice is, all users that have
auth_key_id = 0are users authenticated via OIDC (Azure Entra)Will update below on my findings after manual data only import script has been fix with this.
@danielw97 commented on GitHub (Feb 5, 2025):
Thanks a bunch for the solution here, I also had this issue when upgrading to v0.25.0-beta.1 in my case my machine was a bit behind coming from v0.23.0-alpha12 but everything is working as expected now.
@david0161 commented on GitHub (Feb 6, 2025):
Running in Unraid, updated db and same issue.
@nblock commented on GitHub (Feb 6, 2025):
We'd like to have a look at this. Please send a copy of an old database (0.22, 0.23, …) that fails to upgrade to @kradalby (email in his profile). Please strip sensitive data before sending. Thanks!
@vbrandl commented on GitHub (Feb 6, 2025):
I made a backup of my SQLite before applying the
UPDATE nodes SET auth_key_id = NULL WHERE id IN (...). This fixed the problem for me, headscale started again and everything seems to work.Do you still need an example of a database that breaks the update? If so, can you help me stripping personal info from the DB, than I can give you a copy.
Edit:
Is this enough to remove personal info or did I miss anything? Are those
*_keycolumns innodespublic or private keys? Should I remove those?@kradalby commented on GitHub (Feb 6, 2025):
Yes please, the more the better. I am unsure why it is still around, there is too many combinations of versions I suppose.
Please email them to me, and note which version it was on and which version you upgraded to.
Strip all info that is PII and similar as I would like to include them in test, which means checking them into git.
@kradalby commented on GitHub (Feb 6, 2025):
Only public, so I would same username, hostname and potentially email is the relevant.
please test a copy so it actually still breaks when I am sent, as in, dont remove something that "fixes" the problem.
@vbrandl commented on GitHub (Feb 6, 2025):
I sent you a mail with my broken DB and between which versions the bug appeared.
@xcjs commented on GitHub (Feb 6, 2025):
Not the developer, but thanks for doing that! I hadn't gotten around to scrubbing mine yet, and I also wasn't exactly sure what version I was on.
@vbrandl commented on GitHub (Feb 6, 2025):
I sent @kradalby the script, too, but if you are interested, here is what I did to strip my DB:
I overwrite all PII columns (at least all columns I could find), then I dumped the DB and imported it into a new, clean SQLite DB, to remove possible dead rows, that still contain the old PII.
And as luck would have it, I didn't remove the previous docker image so I knew my update path :)
@kradalby commented on GitHub (Feb 7, 2025):
I've posted a fix for the pre_auth_key constraint failure now, I am going to test it a bit and then release it to
0.24.3and0.25.0-beta.2.https://github.com/juanfont/headscale/compare/v0.24.2...kradalby:headscale:kradalby/release-v0.24.3?expand=1
@kradalby commented on GitHub (Feb 7, 2025):
I'm gonna close this ticket after those releases as it is now a mash of different version and different foreign key errors.
If you are running into more issues, always try to upgrade straight to the latest version (
0.24.3) which contains most fixes.If you are still having problems, please open a new issue, but be explicit about:
I also appreciate databases being emailed to me stripped of info so I can check them in to git as a part of tests.
@danielw97 commented on GitHub (Feb 7, 2025):
Thanks a lot, I've tested the migration from my 0.23.0 alpha database that I kept a backup copy of going to 0.25.0 beta 2 and things now work as expected.
Appreciate your work on this, I'll also send you a copy of the old database after I strip it.