mirror of
https://github.com/juanfont/headscale.git
synced 2026-01-11 20:00:28 +01:00
[Bug] Upgrade from version 0.22.3 to 0.23-beta2 failed database migration #773
Closed
opened 2025-12-29 02:23:50 +01:00 by adam
·
10 comments
No Branch/Tag Specified
main
update_flake_lock_action
gh-pages
kradalby/release-v0.27.2
dependabot/go_modules/golang.org/x/crypto-0.45.0
dependabot/go_modules/github.com/opencontainers/runc-1.3.3
copilot/investigate-headscale-issue-2788
copilot/investigate-visibility-issue-2788
copilot/investigate-issue-2833
copilot/debug-issue-2846
copilot/fix-issue-2847
dependabot/go_modules/github.com/go-viper/mapstructure/v2-2.4.0
dependabot/go_modules/github.com/docker/docker-28.3.3incompatible
kradalby/cli-experiement3
doc/0.26.1
doc/0.25.1
doc/0.25.0
doc/0.24.3
doc/0.24.2
doc/0.24.1
doc/0.24.0
kradalby/build-docker-on-pr
topic/docu-versioning
topic/docker-kos
juanfont/fix-crash-node-id
juanfont/better-disclaimer
update-contributors
topic/prettier
revert-1893-add-test-stage-to-docs
add-test-stage-to-docs
remove-node-check-interval
fix-empty-prefix
fix-ephemeral-reusable
bug_report-debuginfo
autogroups
logs-to-stderr
revert-1414-topic/fix_unix_socket
rename-machine-node
port-embedded-derp-tests-v2
port-derp-tests
duplicate-word-linter
update-tailscale-1.36
warn-against-apache
ko-fi-link
more-acl-tests
fix-typo-standalone
parallel-nolint
tparallel-fix
rerouting
ssh-changelog-docs
oidc-cleanup
web-auth-flow-tests
kradalby-gh-runner
fix-proto-lint
remove-funding-links
go-1.19
enable-1.30-in-tests
0.16.x
cosmetic-changes-integration
tmp-fix-integration-docker
fix-integration-docker
configurable-update-interval
show-nodes-online
hs2021
acl-syntax-fixes
ts2021-implementation
fix-spurious-updates
unstable-integration-tests
mandatory-stun
embedded-derp
prtemplate-fix
v0.28.0-beta.1
v0.27.2-rc.1
v0.27.1
v0.27.0
v0.27.0-beta.2
v0.27.0-beta.1
v0.26.1
v0.26.0
v0.26.0-beta.2
v0.26.0-beta.1
v0.25.1
v0.25.0
v0.25.0-beta.2
v0.24.3
v0.25.0-beta.1
v0.24.2
v0.24.1
v0.24.0
v0.24.0-beta.2
v0.24.0-beta.1
v0.23.0
v0.23.0-rc.1
v0.23.0-beta.5
v0.23.0-beta.4
v0.23.0-beta3
v0.23.0-beta2
v0.23.0-beta1
v0.23.0-alpha12
v0.23.0-alpha11
v0.23.0-alpha10
v0.23.0-alpha9
v0.23.0-alpha8
v0.23.0-alpha7
v0.23.0-alpha6
v0.23.0-alpha5
v0.23.0-alpha4
v0.23.0-alpha4-docker-ko-test9
v0.23.0-alpha4-docker-ko-test8
v0.23.0-alpha4-docker-ko-test7
v0.23.0-alpha4-docker-ko-test6
v0.23.0-alpha4-docker-ko-test5
v0.23.0-alpha-docker-release-test-debug2
v0.23.0-alpha-docker-release-test-debug
v0.23.0-alpha4-docker-ko-test4
v0.23.0-alpha4-docker-ko-test3
v0.23.0-alpha4-docker-ko-test2
v0.23.0-alpha4-docker-ko-test
v0.23.0-alpha3
v0.23.0-alpha2
v0.23.0-alpha1
v0.22.3
v0.22.2
v0.23.0-alpha-docker-release-test
v0.22.1
v0.22.0
v0.22.0-alpha3
v0.22.0-alpha2
v0.22.0-alpha1
v0.22.0-nfpmtest
v0.21.0
v0.20.0
v0.19.0
v0.19.0-beta2
v0.19.0-beta1
v0.18.0
v0.18.0-beta4
v0.18.0-beta3
v0.18.0-beta2
v0.18.0-beta1
v0.17.1
v0.17.0
v0.17.0-beta5
v0.17.0-beta4
v0.17.0-beta3
v0.17.0-beta2
v0.17.0-beta1
v0.17.0-alpha4
v0.17.0-alpha3
v0.17.0-alpha2
v0.17.0-alpha1
v0.16.4
v0.16.3
v0.16.2
v0.16.1
v0.16.0
v0.16.0-beta7
v0.16.0-beta6
v0.16.0-beta5
v0.16.0-beta4
v0.16.0-beta3
v0.16.0-beta2
v0.16.0-beta1
v0.15.0
v0.15.0-beta6
v0.15.0-beta5
v0.15.0-beta4
v0.15.0-beta3
v0.15.0-beta2
v0.15.0-beta1
v0.14.0
v0.14.0-beta2
v0.14.0-beta1
v0.13.0
v0.13.0-beta3
v0.13.0-beta2
v0.13.0-beta1
upstream/v0.12.4
v0.12.4
v0.12.3
v0.12.2
v0.12.2-beta1
v0.12.1
v0.12.0-beta2
v0.12.0-beta1
v0.11.0
v0.10.8
v0.10.7
v0.10.6
v0.10.5
v0.10.4
v0.10.3
v0.10.2
v0.10.1
v0.10.0
v0.9.3
v0.9.2
v0.9.1
v0.9.0
v0.8.1
v0.8.0
v0.7.1
v0.7.0
v0.6.1
v0.6.0
v0.5.2
v0.5.1
v0.5.0
v0.4.0
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.2
v0.2.1
v0.2.0
v0.1.1
v0.1.0
Labels
Clear labels
CLI
DERP
DNS
Nix
OIDC
SSH
bug
database
documentation
duplicate
enhancement
faq
good first issue
grants
help wanted
might-come
needs design doc
needs investigation
no-stale-bot
out of scope
performance
policy 📝
pull-request
question
regression
routes
stale
tags
tailscale-feature-gap
well described ❤️
wontfix
Mirrored from GitHub Pull Request
No Label
bug
Milestone
No items
No Milestone
Projects
Clear projects
No project
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: starred/headscale#773
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @mpoindexter on GitHub (Aug 23, 2024).
Is this a support request?
Is there an existing issue for this?
Current Behavior
Upgrading from 0.22.3 to 0.23-beta2 failed on database migration with this error:
Expected Behavior
Migration completes successfully
Steps To Reproduce
Have an existing database where there are rows in the
routestable that reference nodes that no longer exist in thenodestable. Not sure how it happened, but seems like I had a number of these, seems like older versions of the code could cause this to happen.Running
delete from routes where node_id not in (select id from nodes);and then attempting to upgrade again solved the issue, but ideally this would be done as part of the migration scripts.Environment
Runtime environment
Anything else?
No response
@kradalby commented on GitHub (Aug 23, 2024):
I think I have fixed this in #2076, do you have the ability to test it?
It is some tech debts from the early days of non structured migrations that keep biting us.
@mpoindexter commented on GitHub (Aug 23, 2024):
In some sense the change in #2076 works in that migration does not fail. It does mean that there is now a difference in the structure of the
routestable between a fresh database and my upgraded one - specifically, the foreign key betweenroutesandnodedoes not get created. This doesn't seem ideal, IMO it is best to have the migration scripts remove any invalid data vs continuing to have data that doesn't conform to the intended schema still around and creating problems.@mpoindexter commented on GitHub (Aug 23, 2024):
I tried as an alternate fix the following code vs only running automigrate if the table does not exist:
and it solved this problem, and did not delete my existing routes (other than the ones that were linked to nodes that did not exist), and I ended up with the correct foreign keys created.
@kradalby commented on GitHub (Aug 24, 2024):
apologise, when I posted my reply to this PR I thought I was posting on #2063, so indeed, this has not been fixed, my apologies. Too late on a friday...
Thank you for investigating, I'll take this and amend the PR.
@kradalby commented on GitHub (Aug 24, 2024):
This fix breaks #2063, so I will have to investigate. The whole initial migration is incredibly fragile and horrible, but untangling it is next to impossible considering we dont have databases to verify with.
Do you have an unupgraded 0.22 database you could email me that I can test with?
@mpoindexter commented on GitHub (Aug 26, 2024):
Yes, I have a development one that I have a backup of. I'll update it to remove any sensitive information and send it over today. FWIW, I'm having trouble seeing how the AutoMigrate command would drop rows as described in #2063 - (https://github.com/go-gorm/gorm/blob/v1.25.10/migrator/migrator.go#L121) doesn't seem to have any logic that would drop rows
@kradalby commented on GitHub (Aug 27, 2024):
Could you by any chance create the dev one as sqlite? it would make it infinitely easier to make it part of a unit test. And it is our recommended database.
I also do not understand why its dropped, but I'm gonna pay back the missing techdebt by adding tests for migrations, if not only to make sure we dont regress.
@kradalby commented on GitHub (Aug 27, 2024):
I've converted it to SQLite and made this integration test:
ac045c3(#2076)I am not sure if it ended up correctly, but at least now I got two failing tests. I'll try to continue as I have time, but please feel free to have a look.
@kradalby commented on GitHub (Aug 27, 2024):
@mpoindexter I ended up with a combination, now the migration will use your code to clean up, and then only run the automigrate conditionally.
The foreign keys will not be "created" retroactively, and I am not sure if that is a common thing for the db engine to sort out. For now, I wont spent more time on this as its more of a inconvenience and the code does not rely on fk or cascading deletes so its a minor issue.
@mpoindexter commented on GitHub (Aug 28, 2024):
@kradalby I think there is still a problem with how it has been done. I spent a while tracing down why the
routestable gets cleared on migration and found the following:A number of schema migrations in sqlite are implemented by GORM using the pattern:
See https://github.com/glebarez/sqlite/blob/master/migrator.go#L376
Specifically, alter column, drop column, create constraint, drop constraint are implemented this way.
When step 3 of the above pattern occurs, any
ON DELETE CASCADErelations to the table are dropped. In this case, the relation fromnodestoroutescascades like this (https://github.com/juanfont/headscale/blob/main/hscontrol/types/node.go#L128)The implication of this, is that with the change done in #2076 the routes rows are not dropped for existing databases, since the automigration is not done that actually creates the constraints with the ON DELETE CASCADE. For new databases, the constraint will still be created, and the next time you drop a column from the
nodestable, or add a constraint to that table, the routes table will get cleared. So there's a big trap lurking for you in the future for databases that were not migrated from 0.22.There is a way to deal with this in sqlite: foreign key processing can be turned off temporarily using a pragma and then turned back on. The GORM migrator does this for alter column (https://github.com/glebarez/sqlite/blob/master/migrator.go#L79), but not for drop column/create constraint/drop constraint. I am not sure why that is and it seems like a bug.
sqlite provides guidance on how to perform these migrations here: https://www.sqlite.org/lang_altertable.html - see section 7. Based on that, I would propose that the right way to solve the problem is to:
I have tested this locally, and both test cases you added pass with my changes. Let me know if you would like me to submit a PR for this. I think following this approach will make life a lot easier in the future since you won't have to worry about two different upgrade paths (from a DB that has the constraints created on
routesand from a DB that doesn't have the constraints created onroutes)