mirror of
https://github.com/juanfont/headscale.git
synced 2026-01-11 20:00:28 +01:00
[Bug] Automatic database migration from 0.23.0 to 0.24.0 does not work with postgres #908
Closed
opened 2025-12-29 02:25:46 +01:00 by adam
·
13 comments
No Branch/Tag Specified
main
update_flake_lock_action
gh-pages
kradalby/release-v0.27.2
dependabot/go_modules/golang.org/x/crypto-0.45.0
dependabot/go_modules/github.com/opencontainers/runc-1.3.3
copilot/investigate-headscale-issue-2788
copilot/investigate-visibility-issue-2788
copilot/investigate-issue-2833
copilot/debug-issue-2846
copilot/fix-issue-2847
dependabot/go_modules/github.com/go-viper/mapstructure/v2-2.4.0
dependabot/go_modules/github.com/docker/docker-28.3.3incompatible
kradalby/cli-experiement3
doc/0.26.1
doc/0.25.1
doc/0.25.0
doc/0.24.3
doc/0.24.2
doc/0.24.1
doc/0.24.0
kradalby/build-docker-on-pr
topic/docu-versioning
topic/docker-kos
juanfont/fix-crash-node-id
juanfont/better-disclaimer
update-contributors
topic/prettier
revert-1893-add-test-stage-to-docs
add-test-stage-to-docs
remove-node-check-interval
fix-empty-prefix
fix-ephemeral-reusable
bug_report-debuginfo
autogroups
logs-to-stderr
revert-1414-topic/fix_unix_socket
rename-machine-node
port-embedded-derp-tests-v2
port-derp-tests
duplicate-word-linter
update-tailscale-1.36
warn-against-apache
ko-fi-link
more-acl-tests
fix-typo-standalone
parallel-nolint
tparallel-fix
rerouting
ssh-changelog-docs
oidc-cleanup
web-auth-flow-tests
kradalby-gh-runner
fix-proto-lint
remove-funding-links
go-1.19
enable-1.30-in-tests
0.16.x
cosmetic-changes-integration
tmp-fix-integration-docker
fix-integration-docker
configurable-update-interval
show-nodes-online
hs2021
acl-syntax-fixes
ts2021-implementation
fix-spurious-updates
unstable-integration-tests
mandatory-stun
embedded-derp
prtemplate-fix
v0.28.0-beta.1
v0.27.2-rc.1
v0.27.1
v0.27.0
v0.27.0-beta.2
v0.27.0-beta.1
v0.26.1
v0.26.0
v0.26.0-beta.2
v0.26.0-beta.1
v0.25.1
v0.25.0
v0.25.0-beta.2
v0.24.3
v0.25.0-beta.1
v0.24.2
v0.24.1
v0.24.0
v0.24.0-beta.2
v0.24.0-beta.1
v0.23.0
v0.23.0-rc.1
v0.23.0-beta.5
v0.23.0-beta.4
v0.23.0-beta3
v0.23.0-beta2
v0.23.0-beta1
v0.23.0-alpha12
v0.23.0-alpha11
v0.23.0-alpha10
v0.23.0-alpha9
v0.23.0-alpha8
v0.23.0-alpha7
v0.23.0-alpha6
v0.23.0-alpha5
v0.23.0-alpha4
v0.23.0-alpha4-docker-ko-test9
v0.23.0-alpha4-docker-ko-test8
v0.23.0-alpha4-docker-ko-test7
v0.23.0-alpha4-docker-ko-test6
v0.23.0-alpha4-docker-ko-test5
v0.23.0-alpha-docker-release-test-debug2
v0.23.0-alpha-docker-release-test-debug
v0.23.0-alpha4-docker-ko-test4
v0.23.0-alpha4-docker-ko-test3
v0.23.0-alpha4-docker-ko-test2
v0.23.0-alpha4-docker-ko-test
v0.23.0-alpha3
v0.23.0-alpha2
v0.23.0-alpha1
v0.22.3
v0.22.2
v0.23.0-alpha-docker-release-test
v0.22.1
v0.22.0
v0.22.0-alpha3
v0.22.0-alpha2
v0.22.0-alpha1
v0.22.0-nfpmtest
v0.21.0
v0.20.0
v0.19.0
v0.19.0-beta2
v0.19.0-beta1
v0.18.0
v0.18.0-beta4
v0.18.0-beta3
v0.18.0-beta2
v0.18.0-beta1
v0.17.1
v0.17.0
v0.17.0-beta5
v0.17.0-beta4
v0.17.0-beta3
v0.17.0-beta2
v0.17.0-beta1
v0.17.0-alpha4
v0.17.0-alpha3
v0.17.0-alpha2
v0.17.0-alpha1
v0.16.4
v0.16.3
v0.16.2
v0.16.1
v0.16.0
v0.16.0-beta7
v0.16.0-beta6
v0.16.0-beta5
v0.16.0-beta4
v0.16.0-beta3
v0.16.0-beta2
v0.16.0-beta1
v0.15.0
v0.15.0-beta6
v0.15.0-beta5
v0.15.0-beta4
v0.15.0-beta3
v0.15.0-beta2
v0.15.0-beta1
v0.14.0
v0.14.0-beta2
v0.14.0-beta1
v0.13.0
v0.13.0-beta3
v0.13.0-beta2
v0.13.0-beta1
upstream/v0.12.4
v0.12.4
v0.12.3
v0.12.2
v0.12.2-beta1
v0.12.1
v0.12.0-beta2
v0.12.0-beta1
v0.11.0
v0.10.8
v0.10.7
v0.10.6
v0.10.5
v0.10.4
v0.10.3
v0.10.2
v0.10.1
v0.10.0
v0.9.3
v0.9.2
v0.9.1
v0.9.0
v0.8.1
v0.8.0
v0.7.1
v0.7.0
v0.6.1
v0.6.0
v0.5.2
v0.5.1
v0.5.0
v0.4.0
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.2
v0.2.1
v0.2.0
v0.1.1
v0.1.0
Labels
Clear labels
CLI
DERP
DNS
Nix
OIDC
SSH
bug
database
documentation
duplicate
enhancement
faq
good first issue
grants
help wanted
might-come
needs design doc
needs investigation
no-stale-bot
out of scope
performance
policy 📝
pull-request
question
regression
routes
stale
tags
tailscale-feature-gap
well described ❤️
wontfix
Mirrored from GitHub Pull Request
No Label
bug
Milestone
No items
No Milestone
Projects
Clear projects
No project
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: starred/headscale#908
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @sysvinit on GitHub (Jan 17, 2025).
Is this a support request?
Is there an existing issue for this?
Current Behavior
I tried to update my headscale instance, which uses postgres as the database, from 0.23.0 to 0.24.0 using the Debian packages provided as part of the releases on Github. However, after installing the new package, headscale failed to start due to problems with the database migration, with the following message in the logs:
I was also left unable to blindly downgrade to 0.23.0, as the new version had already executed a migration successfully before encountering the error, leaving my database in an inconsistent state that would not have been supported by the old version.
Expected Behavior
Headscale executes the database migration without errors, and then proceeds to function normally.
Steps To Reproduce
Environment
Runtime environment
Anything else?
I manually inspected the migrations table in the database and compared this with the code in 0.24.0, which indicates that this was a problem with migration
202407191627. This is an automatic migration executed by gorm to update the schema of theuserstable -- I'm not sure if there have been changes in gorm, but I'm not familiar with the library so I didn't investigate that further.As I mentioned above, my database was left in an inconsistent state which prevented me from downgrading (in fairness I should have backed up my database before performing the upgrade...). However, my database did have a uniqueness constraint for
users.nameas implied by the log message above, but in my database the constraint was calledusers_name_keyinstead ofuni_users_name.I used the following SQL to rename the constraint on the
userstable, and with this change the migration which had been causing me problems then executed correctly:My headscale instance now appears to be working correctly, though I haven't tried adding new users or nodes to the Tailnet yet.
@kradalby commented on GitHub (Jan 19, 2025):
This is unfortunate, I will not have time to look into this until next week, but in the meantime if someone has a Postgres backup that can reproduce this issue, I would appreciate getting it.
As often mentioned we don't have the time to do the extensive testing for Postgres as we have so much other things to fix.
A personal rant;
I personally have less and less appreciation for Gorm, and if this turns out to be some sort of auto magic footgun that doesn't help.
I want to get rid of Gorm, but that means get rid of Postgres and only support one database, which we are not doing, so it will likely continue like this where we try to test things, but if no one tests the betas with the setups we can't test, then this will continue to happen. In this case I suspect the beta testers didn't run Postgres so we didn't discover it.
@sysvinit commented on GitHub (Jan 19, 2025):
I do have pg_dumpall backups of the entire postgres server on that machine as part of the daily system backups (useful for restoring after a storage failure, less so for restoring individual tables in the heat of the moment). I could get you a copy of the headscale DB from my server from the night before I attempted the upgrade, though I'll need to find some free time next week to spin up a test instance of postgres which I can load the backup into first in order to extract the headscale parts (and censor things like IP addresses). What would be the best way to send you the DB dump?
@kradalby commented on GitHub (Jan 19, 2025):
Great, email in my GitHub would be sufficient
@sysvinit commented on GitHub (Jan 23, 2025):
Given you've found a fix already -- do you still need a copy of my headscale database out of my backups?
@kradalby commented on GitHub (Jan 23, 2025):
No, thank you, I got one from another user and wrote a test based on that.
@panteparak commented on GitHub (Feb 1, 2025):
Hi @kradalby
Just a heads up, somehow the migration in the fix PR have no effect and produce the same error.
I had to copy out the SQL statement AS IS and run manually on the DB to fix it.
Upgrading from 0.22.3 -> 0.24.2 with Postgres.
If you wanted to investigate further, I could sent you my SQL dump v0.22.3.
@kradalby commented on GitHub (Feb 1, 2025):
Yes please, it worked from 0.23, but maybe the step from 0.22 was different, 0.23 is the time we introduced migrations so it would not surprise me. Email is in my profile
@panteparak commented on GitHub (Feb 11, 2025):
@kradalby
Sorry for the late reply, any tips on sanitising the db dump?, From what I can see, I should strip out IPs, nodekeys, discokeys. Anything else should be done? or is there a script clean this all up.
@panteparak commented on GitHub (Feb 17, 2025):
@kradalby i've send my Postgres DB Dump to your email. Let me know if you need other info.
@haatveit commented on GitHub (Feb 25, 2025):
I've (finally) upgraded from 0.21.0 by starting up every release along the way in case it was needed, but haven't been able to get releases past 0.24.2 working.
On 0.23.0 I got this on first startup after migrations:
then it worked next time I started the service:
From 0.24.0 onward I get this:
If I manually add the constraint via psql, I can run 0.24.0 through 0.24.2:
But if I try to upgrade to 0.24.3 or newer, I get this error on first startup:
and it removes the
uni_users_nameconstraint again. I've looked at the diff between 0.24.2 and 0.24.3 but don't really get why that is happening.@kradalby commented on GitHub (Feb 26, 2025):
I haven’t had a close look, but starting every release would be counter productive. Since there are migration fixes in the fix releases, you should jump to the latest release or at least the latest fix release.
We have not had a database that old in a bit, so if going from 0.21 straight to 0.24.3 or 0.25.1 doesn’t work I would appreciate a scrubbed copy I can write a test against.
@nicka101 commented on GitHub (Feb 28, 2025):
For me, simply spamming the Alter table in Postgres manually to add the
uni_users_nameconstraint back while headscale is starting allows the migrations to succeed and headscale to start on 0.25.1@bazaah commented on GitHub (Mar 22, 2025):
The PreAuthKey and/or Node automigrates here: https://github.com/juanfont/headscale/compare/v0.24.2...v0.24.3#diff-826f8967b23df1209318fba82f94d8ed3eb12136b6cb2d070ddb76d9a059c336R590-R597 are somehow responsible for the
uni_users_nameissue. They seemingly race to create problem, as I see both of them in the logs(or)
I eventually just needed to
INSERT INTO migrations (id) VALUES (202501311657);to get a functional instance again. Is there anything I should run by hand to replicate the expected schema changes from https://github.com/juanfont/headscale/pull/2396?My migration path was: 0.22.3 -> 0.23.0 -> 0.24.0 -> 0.24.2 -> 0.24.3, in case it matters.