[PR #2617] [MERGED] db: add sqlite "source of truth" schema #2764

Closed
opened 2025-12-29 04:18:53 +01:00 by adam · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/juanfont/headscale/pull/2617
Author: @kradalby
Created: 5/21/2025
Status: Merged
Merged: 7/7/2025
Merged by: @kradalby

Base: mainHead: kradalby/schema-truth


📝 Commits (5)

  • d8c63b9 db: add sqlite "source of truth" schema
  • 970db73 changelog: add entry for db
  • 4d160a0 .github/workflow: only run a few selected postgres tests
  • 8e5e07e flake: dont override gopls
  • 3742424 .github/workflows: prettier

📊 Changes

261 files changed (+6461 additions, -306 deletions)

View changed files

📝 .github/ISSUE_TEMPLATE/bug_report.yaml (+6 -2)
📝 .github/ISSUE_TEMPLATE/feature_request.yaml (+6 -3)
📝 .github/workflows/build.yml (+9 -3)
📝 .github/workflows/check-tests.yaml (+3 -1)
📝 .github/workflows/docs-deploy.yml (+2 -1)
📝 .github/workflows/gh-action-integration-generator.go (+21 -4)
.github/workflows/integration-test-template.yml (+95 -0)
📝 .github/workflows/lint.yml (+16 -5)
📝 .github/workflows/release.yml (+3 -1)
📝 .github/workflows/stale.yml (+6 -2)
📝 .github/workflows/test-integration.yaml (+22 -78)
📝 .github/workflows/test.yml (+3 -1)
📝 CHANGELOG.md (+44 -1)
📝 cmd/headscale/cli/serve.go (+8 -0)
📝 flake.nix (+4 -4)
📝 go.mod (+2 -2)
📝 go.sum (+2 -2)
📝 hscontrol/db/db.go (+336 -39)
📝 hscontrol/db/db_test.go (+639 -118)
📝 hscontrol/db/ephemeral_garbage_collector_test.go (+22 -20)

...and 80 more files

📄 Description

This PR is kind of like a "stretch goal".

This PR is very much needed to try to address the longstanding issues the database schema and data integrity has.

This adds a schema.sql which should be the source of truth representation of our schema after all migration is applied (for sqlite). This is to ensure you can look one place to figure out how the current schema is defined and not have to do the patchwork of migrations in your head to get an overview.

Thanks to @nblock for discovering this.

How did we get here?

As far as I have understood, this is a combination of several things, parts of things we did not know at the time, didn't think about and naivety. I have at least identified the following:

SQLite requires foreign key to be set on each connection
Essentially, from the early birth of Headscale, we added constraint to the schema via GORM, but for many many early versions, they were never enforced because SQLite was started without setting the PRAGMA foreign_keys, meaning it will just ignore them all.
Running for years without any enforcing likely let all sorts of odd relationships of data creep in, causing havoc when we eventually enabled it.

This has since been addressed, and from this PR, we will also enforce foreign keys for all migrations (we had to turn it off for some old migrations as they are a bit all over the place).

All migrations was done in ONE function
Until Headscale version 0.23.0, all migrations was just one big function, what does that mean? it meant that no matter which migrations version you started at, or if you had parts of the migrations applied previously, Headscale had no clue and just tried to do them all. This is not very smart, and let us say, it is amazing that it has not ended up worse than it has.

This has since been addressed and all changes are now versioned and runs only if needed.

GORM AutoMigrate is a bit of a trap
The ORM Headscale uses, called GORM has a potentially very nice migration helper called AutoMigrate, simplified, it will look at the Model (your go struct) and the current table, calculate the difference and make a migration path.

The problem here is that if you change a model multiple times over several versions and call AutoMigrate in the first migration, then it will apply changes that are even further ahead than it is suppose to. It does not really use the "snap shot" of the Model from the time of the code base where the migration was written. This is kind of mind field, because you can end up trying to apply a migration in between two AutoMigrate call that depends on the data being in the shape of an older model, before upgrading to the final state.
I am sure we "used AutoMigrate wrong", and it is probably even covered in the docs, but since we often try to clean up old code, we do not want to have a lot of deprecated fields on our models. It just does not fit our code. In addition, the opaque-ness of using AutoMigrate hides away "what is actually changing" for someone who comes back to the migration later and needs to debug something or wants to understand what goes on.

To handle this, from this PR, we will no longer accept the use of AutoMigrate moving forward, and all migrations will have to be explicit.

Why is this PR 6000+ lines

We are doing a very large migration that essentially rebuilds all of the SQLite tables to ensure they are equal to the new truth schema on disk. As some tables out there might have undeleted columns that is not used anymore (AutoMigrate does not remove columns), we have opted for rebuilding.

To make sure this is safe, we have done the following:

  • For every version of headscale from 0.2.0 to 0.26.1 we have

    • extracted the schema at the time
    • dumped a database (empty, but with partial migrations)
    • test that it can be migrated to the newest schema and matches
  • For the other databases we have got hold of with previously "broken" migrations

    • We test migration to the new version
    • We try to validate that the data is as expected when it has finished migration

Note: None of these changes will apply to Postgres which is in a "maintenance only" mode.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/juanfont/headscale/pull/2617 **Author:** [@kradalby](https://github.com/kradalby) **Created:** 5/21/2025 **Status:** ✅ Merged **Merged:** 7/7/2025 **Merged by:** [@kradalby](https://github.com/kradalby) **Base:** `main` ← **Head:** `kradalby/schema-truth` --- ### 📝 Commits (5) - [`d8c63b9`](https://github.com/juanfont/headscale/commit/d8c63b92668a3d27ecb46111957a86ef0b6d273d) db: add sqlite "source of truth" schema - [`970db73`](https://github.com/juanfont/headscale/commit/970db736dff248d703abd3b747d1bd089ae88288) changelog: add entry for db - [`4d160a0`](https://github.com/juanfont/headscale/commit/4d160a0a9d3d747df5cd0c71be30745b81fb327d) .github/workflow: only run a few selected postgres tests - [`8e5e07e`](https://github.com/juanfont/headscale/commit/8e5e07e4817af288ab7bc0ff44f2d18699b9bffb) flake: dont override gopls - [`3742424`](https://github.com/juanfont/headscale/commit/37424246477d2faa47bf573f3bdfd1f270d3759b) .github/workflows: prettier ### 📊 Changes **261 files changed** (+6461 additions, -306 deletions) <details> <summary>View changed files</summary> 📝 `.github/ISSUE_TEMPLATE/bug_report.yaml` (+6 -2) 📝 `.github/ISSUE_TEMPLATE/feature_request.yaml` (+6 -3) 📝 `.github/workflows/build.yml` (+9 -3) 📝 `.github/workflows/check-tests.yaml` (+3 -1) 📝 `.github/workflows/docs-deploy.yml` (+2 -1) 📝 `.github/workflows/gh-action-integration-generator.go` (+21 -4) ➕ `.github/workflows/integration-test-template.yml` (+95 -0) 📝 `.github/workflows/lint.yml` (+16 -5) 📝 `.github/workflows/release.yml` (+3 -1) 📝 `.github/workflows/stale.yml` (+6 -2) 📝 `.github/workflows/test-integration.yaml` (+22 -78) 📝 `.github/workflows/test.yml` (+3 -1) 📝 `CHANGELOG.md` (+44 -1) 📝 `cmd/headscale/cli/serve.go` (+8 -0) 📝 `flake.nix` (+4 -4) 📝 `go.mod` (+2 -2) 📝 `go.sum` (+2 -2) 📝 `hscontrol/db/db.go` (+336 -39) 📝 `hscontrol/db/db_test.go` (+639 -118) 📝 `hscontrol/db/ephemeral_garbage_collector_test.go` (+22 -20) _...and 80 more files_ </details> ### 📄 Description ~This PR is kind of like a "stretch goal"~. This PR is very much needed to try to address the longstanding issues the database schema and data integrity has. This adds a `schema.sql` which should be the source of truth representation of our schema after all migration is applied (for sqlite). This is to ensure you can look one place to figure out how the current schema is defined and not have to do the patchwork of migrations in your head to get an overview. Thanks to @nblock for discovering this. ### How did we get here? As far as I have understood, this is a combination of several things, parts of things we did not know at the time, didn't think about and naivety. I have at least identified the following: **SQLite requires foreign key to be set on each connection** Essentially, from the early birth of Headscale, we added constraint to the schema via GORM, but for many many early versions, they were never enforced because SQLite was started _without_ setting the [`PRAGMA foreign_keys`](https://sqlite.org/pragma.html#pragma_foreign_keys), meaning it will just ignore them all. Running for years without any enforcing likely let all sorts of odd relationships of data creep in, causing havoc when we eventually enabled it. This has since been addressed, and from this PR, we will also enforce foreign keys for all migrations (we had to turn it off for some old migrations as they are a bit all over the place). **All migrations was done in ONE function** Until Headscale version `0.23.0`, all migrations was just one big function, what does that mean? it meant that no matter which migrations version you started at, or if you had parts of the migrations applied previously, Headscale had no clue and just tried to do them all. This is not very smart, and let us say, it is amazing that it has not ended up _worse_ than it has. This has since been addressed and all changes are now versioned and runs only if needed. **GORM AutoMigrate is a bit of a trap** The [ORM](https://en.wikipedia.org/wiki/Object–relational_mapping) Headscale uses, called [GORM](https://gorm.io) has a potentially very nice migration helper called [AutoMigrate](https://gorm.io/docs/migration.html), simplified, it will look at the Model (your go struct) and the current table, calculate the difference and make a migration path. The problem here is that if you change a model multiple times over several versions and call AutoMigrate in the first migration, then it will apply changes that are even further ahead than it is suppose to. It does not really use the "snap shot" of the Model from the time of the code base where the migration was written. This is kind of mind field, because you can end up trying to apply a migration in between two AutoMigrate call that depends on the data being in the shape of an older model, before upgrading to the final state. I am sure we "used AutoMigrate wrong", and it is probably even covered in the docs, but since we often try to clean up old code, we do not want to have a lot of deprecated fields on our models. It just does not fit our code. In addition, the opaque-ness of using AutoMigrate hides away "what is actually changing" for someone who comes back to the migration later and needs to debug something or wants to understand what goes on. To handle this, from this PR, we will no longer accept the use of AutoMigrate moving forward, and all migrations will have to be explicit. ### Why is this PR 6000+ lines We are doing a very large migration that essentially rebuilds all of the SQLite tables to ensure they are equal to the new truth schema on disk. As some tables out there might have undeleted columns that is not used anymore (AutoMigrate does not remove columns), we have opted for rebuilding. To make sure this is safe, we have done the following: - For every version of headscale from 0.2.0 to 0.26.1 we have - extracted the schema at the time - dumped a database (empty, but with partial migrations) - test that it can be migrated to the newest schema and matches - For the other databases we have got hold of with previously "broken" migrations - We test migration to the new version - We try to validate that the data is as expected when it has finished migration **Note:** None of these changes will apply to Postgres which is in a "maintenance only" mode. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
adam added the pull-request label 2025-12-29 04:18:53 +01:00
adam closed this issue 2025-12-29 04:18:53 +01:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#2764