[Feature] Possibility of using a larger database connection pool (while still using the pure Go SQLite implementation) #1011

Closed
opened 2025-12-29 02:27:25 +01:00 by adam · 13 comments
Owner

Originally created by @aergus-tng on GitHub (May 6, 2025).

Use case

Under certain loads, having only a single connection becomes a bottleneck. We observed that the server is more responsive under high load when more connections are allowed.

Description

In hscontrol/db/db.go, there is the following code.

// The pure Go SQLite library does not handle locking in
// the same way as the C based one and we can't use the gorm
// connection pool as of 2022/02/23.
sqlDB, _ := db.DB()
sqlDB.SetMaxIdleConns(1)
sqlDB.SetMaxOpenConns(1)

This goes back to #349, where there are reports of SQLITE_BUSY errors.

While I cannot pinpoint which changes in glebarez/sqlite could have made a difference , in our tests using the current version of glebarez/sqlite (v1.11.0), the Headscale server could operate with a larger connection pool, so the number of connections could be made configurable.

Contribution

  • I can write the design doc for this feature
  • I can contribute this feature

How can it be implemented?

The maximum number of idle and open connections could be made configurable via environment variables or as part of the config file (under the tuning key?). The default value for both can be 1 to keep the current behavior as the default.

Originally created by @aergus-tng on GitHub (May 6, 2025). ### Use case Under certain loads, having only a single connection becomes a bottleneck. We observed that the server is more responsive under high load when more connections are allowed. ### Description In `hscontrol/db/db.go`, there is the following code. ```go // The pure Go SQLite library does not handle locking in // the same way as the C based one and we can't use the gorm // connection pool as of 2022/02/23. sqlDB, _ := db.DB() sqlDB.SetMaxIdleConns(1) sqlDB.SetMaxOpenConns(1) ``` This goes back to #349, where there are reports of `SQLITE_BUSY` errors. While I cannot pinpoint which changes in `glebarez/sqlite` could have made a difference , in our tests using the current version of `glebarez/sqlite` (v1.11.0), the Headscale server could operate with a larger connection pool, so the number of connections could be made configurable. ### Contribution - [x] I can write the design doc for this feature - [x] I can contribute this feature ### How can it be implemented? The maximum number of idle and open connections could be made configurable via environment variables or as part of the config file (under the `tuning` key?). The default value for both can be `1` to keep the current behavior as the default.
adam added the enhancement label 2025-12-29 02:27:25 +01:00
adam closed this issue 2025-12-29 02:27:28 +01:00
Author
Owner

@kradalby commented on GitHub (May 6, 2025):

hmm, I am sceptical, mainly because the error handling has not really been tuned to handle the SQLITE_BUSY errors, which might end up in unexpected behaviour that is hard to recover from.

Even if it is not a default, users have a tendency to find options they don't know how to use and for it to produce extra support load.

On the bottleneck, SQLite is famously multiple readers, one writer. I don't expect the writer is our bottleneck, but rather the reader.
I think a more meaningful pattern to explore would be to have (I think I will mess up the terminology here but hopefully the idea gets across) two connection pools:

  • Writer pool, only one connection, can read and write
  • Reader pool, multiple connections, can only read

All the methods in the db package that exclusively reads (Get*, List*, etc) will only use the reader pool and therefore have less blocks.

I am not sure if we would be able to implement this in practice considering GORM, dual database support, but I think it is an interesting pattern for SQLite.

@kradalby commented on GitHub (May 6, 2025): hmm, I am sceptical, mainly because the error handling has not really been tuned to handle the `SQLITE_BUSY` errors, which might end up in unexpected behaviour that is hard to recover from. Even if it is not a default, users have a tendency to find options they don't know how to use and for it to produce extra support load. On the bottleneck, SQLite is famously multiple readers, one writer. I don't expect the writer is our bottleneck, but rather the reader. I think a more meaningful pattern to explore would be to have (I think I will mess up the terminology here but hopefully the idea gets across) two connection pools: - Writer pool, only one connection, can read and write - Reader pool, multiple connections, can only read All the methods in the `db` package that exclusively reads (`Get*`, `List*`, etc) will only use the reader pool and therefore have less blocks. I am not sure if we would be able to implement this in practice considering GORM, dual database support, but I think it is an interesting pattern for SQLite.
Author
Owner

@aergus-tng commented on GitHub (May 7, 2025):

I have the impression that implementing a single writer / multiple readers scheme on top of the current dual database setup would indeed result in rather complicated code you probably wouldn't be willing to merge. Moreover, the Write-Ahead Logging mode of SQLite seems to seems to already provide more or less that:

[...] Because writers do nothing that would interfere with the actions of readers, writers and readers can run at the same time. However, since there is only one WAL file, there can only be one writer at a time.

After having a closer look at the SQLite components, my current guess is that the reason why using concurrent connections in Headscale caused issues in 2022 but seems to work now is actually the fact that the WAL mode became the default in 2024 (cf. #1985) -- we do see quite a few SQLITE_BUSY errors when we disable WAL and try to use multiple connections.

According to the documentation, SQLITE_BUSY occurs in WAL mode only in the following edge cases, recovering from which is arguably not necessary for Headscale:

If another database connection has the database mode open in exclusive locking mode then all queries against the database will return SQLITE_BUSY. [...]

As Headscale is supposed to have full control over its database, this should not occur if Headscale is configured to use WAL.

When the last connection to a particular database is closing, that connection will acquire an exclusive lock for a short time while it cleans up the WAL and shared-memory files. If a separate attempt is made to open and query the database while the first connection is still in the middle of its cleanup process, the second connection might get anSQLITE_BUSY error.

AFAICT Headscale (or rather the database/sql driver) keeps idle connections during normal operation, so this is probably only relevant for the clean up at shutdown, during which it is arguably acceptable that some operations cannot be completed. (In fact, already now we occasionally see "sql: database is closed" errors when we shut down the server under high load.)

If the last connection to a database crashed, then the first new connection to open the database will start a recovery process. An exclusive lock is held during recovery. So if a third database connection tries to jump in and query while the second connection is running recovery, the third connection will get an SQLITE_BUSY error."

This is a one-time "hiccup" that should disappear after Headscale has recovered from the crash. (If connections crash systematically, that's probably indication of another issue.)

Given this, I would extend my proposal by a configuration validation for checking that the WAL mode is enabled when using multiple database connections, which should then be robust enough IMHO. However, if you think that a subset of the aforementioned cases constitutes a problem, I would rather propose handling the SQLITE_BUSY errors (e.g. with a simple retry mechanism) than implementing an additional pooling layer.

PS: The links above point to the documentation of the C implementation of SQLite, but AFAICT the pure Go implementation (modernc.org/sqlite) follows it closely.

@aergus-tng commented on GitHub (May 7, 2025): I have the impression that implementing a single writer / multiple readers scheme on top of the current dual database setup would indeed result in rather complicated code you probably wouldn't be willing to merge. Moreover, the Write-Ahead Logging mode of SQLite [seems to](https://sqlite.org/wal.html#sometimes_queries_return_sqlite_busy_in_wal_mode#concurrency) seems to already provide more or less that: > [...] Because writers do nothing that would interfere with the actions of readers, writers and readers can run at the same time. However, since there is only one WAL file, there can only be one writer at a time. After having a closer look at the SQLite components, my current guess is that the reason why using concurrent connections in Headscale caused issues in 2022 but seems to work now is actually the fact that the WAL mode became the default in 2024 (cf. #1985) -- we do see quite a few `SQLITE_BUSY` errors when we disable WAL and try to use multiple connections. According to the [documentation](https://sqlite.org/wal.html#sometimes_queries_return_sqlite_busy_in_wal_mode), `SQLITE_BUSY` occurs in WAL mode only in the following edge cases, recovering from which is arguably not necessary for Headscale: > If another database connection has the database mode open in exclusive locking mode then all queries against the database will return `SQLITE_BUSY`. [...] As Headscale is supposed to have full control over its database, this should not occur if Headscale is configured to use WAL. > When the last connection to a particular database is closing, that connection will acquire an exclusive lock for a short time while it cleans up the WAL and shared-memory files. If a separate attempt is made to open and query the database while the first connection is still in the middle of its cleanup process, the second connection might get an`SQLITE_BUSY` error. AFAICT Headscale (or rather the `database/sql` driver) keeps idle connections during normal operation, so this is probably only relevant for the clean up at shutdown, during which it is arguably acceptable that some operations cannot be completed. (In fact, already now we occasionally see "sql: database is closed" errors when we shut down the server under high load.) > If the last connection to a database crashed, then the first new connection to open the database will start a recovery process. An exclusive lock is held during recovery. So if a third database connection tries to jump in and query while the second connection is running recovery, the third connection will get an `SQLITE_BUSY` error." This is a one-time "hiccup" that should disappear after Headscale has recovered from the crash. (If connections crash _systematically_, that's probably indication of another issue.) Given this, I would extend my proposal by a configuration validation for checking that the WAL mode is enabled when using multiple database connections, which should then be robust enough IMHO. However, if you think that a subset of the aforementioned cases constitutes a problem, I would rather propose handling the` SQLITE_BUSY` errors (e.g. with a simple retry mechanism) than implementing an additional pooling layer. PS: The links above point to the documentation of the C implementation of SQLite, but AFAICT the pure Go implementation ([`modernc.org/sqlite`](https://pkg.go.dev/modernc.org/sqlite)) follows it closely.
Author
Owner

@kradalby commented on GitHub (May 7, 2025):

The arguments for now vs then, since we now have WAL, makes sense, it makes sense that the semantics might be a bit different.

I am still interested in a conservative approach, how do we determine how many connections is safe? I would prefer to stay away from the tuning config, as we dont really want users to use it.

In general, I think it is more meaningful over time, to not really go via the database all the time, like we do now. We spend so much time doing unmarshalling of JSON and so on, so I think that would be more impactful. But I suppose this is an ok stepping stone.

@kradalby commented on GitHub (May 7, 2025): The arguments for now vs then, since we now have WAL, makes sense, it makes sense that the semantics might be a bit different. I am still interested in a conservative approach, how do we determine how many connections is safe? I would prefer to stay away from the tuning config, as we dont really want users to use it. In general, I think it is more meaningful over time, to not really go via the database all the time, like we do now. We spend so much time doing unmarshalling of JSON and so on, so I think that would be more impactful. But I suppose this is an ok stepping stone.
Author
Owner

@aergus-tng commented on GitHub (May 7, 2025):

I am still interested in a conservative approach, how do we determine how many connections is safe?

I have the impression that with WAL, the number of connections can be scaled quite liberally. For instance, in our tests we used a server with 16 database connections on a 16-core machine and it could support 500 online nodes without SQLITE_BUSY errors.

I would prefer to stay away from the tuning config, as we dont really want users to use it.

After my original comment I've seen that there is already the precedent of the max_open_conns and max_idle_conns fields of the Postgres configuration, this is basically the analogous settings for SQLite. After reading the WAL documentation I believe that this is safe enough to expose to the users with appropriate documentation and reasonable defaults, but it's of course your call.

In general, I think it is more meaningful over time, to not really go via the database all the time, like we do now. We spend so much time doing unmarshalling of JSON and so on, so I think that would be more impactful.

That makes sense. (Though depending on the remaining operations, database parallelism might still be useful in the end.)

But I suppose this is an ok stepping stone.

What exactly do you mean with "this" here?

@aergus-tng commented on GitHub (May 7, 2025): > I am still interested in a conservative approach, how do we determine how many connections is safe? I have the impression that with WAL, the number of connections can be scaled quite liberally. For instance, in our tests we used a server with 16 database connections on a 16-core machine and it could support 500 online nodes without `SQLITE_BUSY` errors. > I would prefer to stay away from the tuning config, as we dont really want users to use it. After my original comment I've seen that there is already the precedent of the `max_open_conns` and `max_idle_conns` fields of the Postgres configuration, this is basically the analogous settings for SQLite. After reading the WAL documentation I believe that this is safe enough to expose to the users with appropriate documentation and reasonable defaults, but it's of course your call. > In general, I think it is more meaningful over time, to not really go via the database all the time, like we do now. We spend so much time doing unmarshalling of JSON and so on, so I think that would be more impactful. That makes sense. (Though depending on the remaining operations, database parallelism might still be useful in the end.) > But I suppose this is an ok stepping stone. What exactly do you mean with "this" here?
Author
Owner

@kradalby commented on GitHub (May 9, 2025):

What exactly do you mean with "this" here?

I think we can do it, and as mentioned postgres has this. If you want to a PR, that would be appreciated, as you point out appropriate docs would be important. I am not sure if we should keep the default at 1 or not. I am happy to be persuaded there as I do not have the size of infra you have.

As a highlevel acceptance, maybe you could add another check to https://github.com/juanfont/headscale/blob/main/integration/scenario.go#L304 to ensure we dont see this in the integration tests as an acceptance thing? You could change the default in integration to be higher.

@kradalby commented on GitHub (May 9, 2025): > What exactly do you mean with "this" here? I think we can do it, and as mentioned postgres has this. If you want to a PR, that would be appreciated, as you point out appropriate docs would be important. I am not sure if we should keep the default at 1 or not. I am happy to be persuaded there as I do not have the size of infra you have. As a highlevel acceptance, maybe you could add another check to https://github.com/juanfont/headscale/blob/main/integration/scenario.go#L304 to ensure we dont see this in the integration tests as an acceptance thing? You could change the default in integration to be higher.
Author
Owner

@aergus-tng commented on GitHub (May 9, 2025):

Thanks! I will make a PR, including checks for SQLITE_BUSY in ShutdownAssertNoPanics.

I would expose the options max_open_conns, max_idle_conns and conn_max_idle_time_secs for SQLite as done for Postgres. As these options would then be common to both databases, they could be lifted to the database field of the config. Should I do that or should there be separate fields for connection settings under database.sqlite and database.postgres?

As for the default pool size:

  • My gut feeling is that runtime.NumCPU() or runtime.GOMAXPROCS(0) would be reasonable because that would allow using all available CPU capacity under high load (and we do see full CPU load in our tests with multiple database connections, for instance when NotifyAll is called with a full update). However, I would understand if you don't want the default to be dynamic.
  • Another option would be to set it to 2 (edit: which is in fact the default for the maximum number of idle connections in database/sql) to benefit from some database concurrency by default while only minimally interfering with existing setups, which might run on single-CPU environments.
  • And the most conservative option would of course be keeping 1 and requiring users who want/need database concurrency to opt in.
@aergus-tng commented on GitHub (May 9, 2025): Thanks! I will make a PR, including checks for `SQLITE_BUSY` in `ShutdownAssertNoPanics`. I would expose the options `max_open_conns`, `max_idle_conns` and `conn_max_idle_time_secs` for SQLite as done for Postgres. As these options would then be common to both databases, they could be lifted to the `database` field of the config. Should I do that or should there be separate fields for connection settings under `database.sqlite` and `database.postgres`? As for the default pool size: * My gut feeling is that [`runtime.NumCPU()`](https://pkg.go.dev/runtime#NumCPU) or [`runtime.GOMAXPROCS(0)`](https://pkg.go.dev/runtime#GOMAXPROCS) would be reasonable because that would allow using all available CPU capacity under high load (and we do see full CPU load in our tests with multiple database connections, for instance when `NotifyAll` is called with a full update). However, I would understand if you don't want the default to be dynamic. * Another option would be to set it to 2 (edit: which is in fact the default for the maximum number of idle connections in `database/sql`) to benefit from some database concurrency by default while only minimally interfering with existing setups, which might run on single-CPU environments. * And the most conservative option would of course be keeping 1 and requiring users who want/need database concurrency to opt in.
Author
Owner

@aergus-tng commented on GitHub (May 13, 2025):

Unfortunately, it turned out that the conclusions I drew from the WAL documentation were incorrect and I did observe SQLITE_BUSY errors in integration tests (see e.g. here).

An in-depth discussion of why this happens can be found in this blog post, but the gist is that there are situations where two parallel writing transactions (where at least one starts as a read-only transaction) cannot be both completed while maintaining the "serializable" isolation level guarantee of SQLite and thus result in a SQLITE_BUSY error.

For Headscale, this seems to happen for instance when saving nodes. My guess is that a saving transaction starts as a read-only transaction because it checks if a node with the ID already exists and is attempted to upgrade to a writing transaction while actually saving the node, so such transactions cannot be arbitrarily parallelized.

The recommendation in the aforementioned blog post is explicitly starting a transaction with BEGIN IMMEDIATE when it is foreseeable that the transaction will write to the database at some point. AFAICT the abstractions layers used in Headscale (database/sql and GORM) unfortunately do not expose such transaction options and people suggest workarounds like running ROLLBACK; BEGIN IMMEDIATE after starting a transaction, which is far from ideal.

Under these circumstances, I think that your original suggestion of ensuring that there is a single writer in application code is the best compromise. I will try adding a mutex to make sure that at most one potentially writing transaction is running at any time.

@aergus-tng commented on GitHub (May 13, 2025): Unfortunately, it turned out that the conclusions I drew from the WAL documentation were incorrect and I did observe `SQLITE_BUSY` errors in integration tests (see e.g. [here](https://github.com/aergus-tng/headscale/actions/runs/14931778050/job/41949601390)). An in-depth discussion of why this happens can be found in [this blog post](https://berthub.eu/articles/posts/a-brief-post-on-sqlite3-database-locked-despite-timeout/), but the gist is that there are situations where two parallel writing transactions (where at least one starts as a read-only transaction) cannot be both completed while maintaining [the "serializable" isolation level guarantee](https://www.sqlite.org/isolation.html) of SQLite and thus result in a `SQLITE_BUSY` error. For Headscale, this seems to happen for instance when saving nodes. My guess is that a saving transaction starts as a read-only transaction because it checks if a node with the ID already exists and is attempted to upgrade to a writing transaction while actually saving the node, so such transactions cannot be arbitrarily parallelized. The recommendation in the aforementioned blog post is explicitly starting a transaction with `BEGIN IMMEDIATE` when it is foreseeable that the transaction will write to the database at some point. AFAICT the abstractions layers used in Headscale (`database/sql` and GORM) unfortunately [do not expose](https://github.com/golang/go/issues/19981) such transaction options and people [suggest](https://github.com/mattn/go-sqlite3/issues/400#issuecomment-598953685) workarounds like running `ROLLBACK; BEGIN IMMEDIATE` after starting a transaction, which is far from ideal. Under these circumstances, I think that your original suggestion of ensuring that there is a single writer in application code is the best compromise. I will try adding a mutex to make sure that at most one potentially writing transaction is running at any time.
Author
Owner

@aergus-tng commented on GitHub (May 28, 2025):

A status update: We see SQLITE_BUSY errors when "too many" connections are allowed (in particular in integration tests with 3 connections on GitHub runners) even when we guard write operations with a mutex.

To investigate whether this is an issue with the SQLite driver/implementation, we ran some tests with the C implementation of SQL with its default GORM driver and could not reproduce the busy errors when using a write lock. We therefore suspect that the WAL / single-writer-many-readers implementation of the Go implementation might not be on par with the C implementation and will open an issue over there if we can pinpoint the exact problem.

As for Headscale, though, I probably won't be able to submit a PR with a correct/stable multi-connection implementation using the Go SQLite implementation anytime soon. FWIW, the current state of my attempts can be found on this branch: https://github.com/aergus-tng/headscale/tree/multiple-sqlite-connections

@aergus-tng commented on GitHub (May 28, 2025): A status update: We see `SQLITE_BUSY` errors when "too many" connections are allowed (in particular in integration tests with 3 connections on GitHub runners) even when we guard write operations with a mutex. To investigate whether this is an issue with the SQLite driver/implementation, we ran some tests with the [C implementation of SQL](https://github.com/mattn/go-sqlite3) with its [default GORM driver](https://github.com/go-gorm/sqlite) and could not reproduce the busy errors when using a write lock. We therefore suspect that the WAL / single-writer-many-readers implementation of the Go implementation might not be on par with the C implementation and will open an issue over there if we can pinpoint the exact problem. As for Headscale, though, I probably won't be able to submit a PR with a correct/stable multi-connection implementation using the Go SQLite implementation anytime soon. FWIW, the current state of my attempts can be found on this branch: [https://github.com/aergus-tng/headscale/tree/multiple-sqlite-connections](https://github.com/aergus-tng/headscale/tree/multiple-sqlite-connections)
Author
Owner

@kradalby commented on GitHub (May 28, 2025):

I havent had time to look closely into this, but since the cgo-free sqlite was added, an interesting other alternative has popped up using wasm instead of transpiling C to Go code, this might be something to checkout: https://github.com/ncruces/go-sqlite3

@kradalby commented on GitHub (May 28, 2025): I havent had time to look closely into this, but since the cgo-free sqlite was added, an interesting other alternative has popped up using wasm instead of transpiling C to Go code, this might be something to checkout: https://github.com/ncruces/go-sqlite3
Author
Owner

@aergus-tng commented on GitHub (May 30, 2025):

Thanks for the hint, we tried out ncruces/go-sqlite3. It seems to be better in terms of SQLITE_BUSY errors, in particular some integration tests that never passed with the other Go implementation occasionally pass after some retries (cf. this run), but they are still not eliminated.

By now we have also observed SQLITE_BUSY errors in some integration test runs (cf. this one) when using the CGO implementation.

Judging by these observations, it's probably not an SQLite implementation problem after all (maybe the the faster implementations simply require more load for the busy issues to occur?), but I find it hard to reason about what's happening because I don't really know the internals of SQLite. One speculation I have is that there might be some writing/syncing going on after (non-immediate) transactions, so that the application-level lock is released before the database is fully consistent.

In any case, this will probably need some long-term debugging, and there will probably be not much work on this issue on our side in the next weeks (in particular, I personally will be "out of office" for a while).

@aergus-tng commented on GitHub (May 30, 2025): Thanks for the hint, we tried out `ncruces/go-sqlite3`. It seems to be _better_ in terms of `SQLITE_BUSY` errors, in particular some integration tests that never passed with the other Go implementation occasionally pass after some retries (cf. [this run](https://github.com/aergus-tng/headscale/actions/runs/15339717539/job/43163477608)), but they are still not eliminated. By now we have also observed `SQLITE_BUSY` errors in some integration test runs (cf. [this one](https://github.com/aergus-tng/headscale/actions/runs/15340041732/job/43164332889)) when using the CGO implementation. Judging by these observations, it's probably not an SQLite implementation problem after all (maybe the the faster implementations simply require more load for the busy issues to occur?), but I find it hard to reason about what's happening because I don't really know the internals of SQLite. One speculation I have is that there might be some writing/syncing going on _after_ (non-immediate) transactions, so that the application-level lock is released before the database is fully consistent. In any case, this will probably need some long-term debugging, and there will probably be not much work on this issue on our side in the next weeks (in particular, I personally will be "out of office" for a while).
Author
Owner

@aergus-tng commented on GitHub (Aug 26, 2025):

A brief (non-)update: The potential solution I found (starting all transactions that can possibly write to the database as IMMEDIATE) appears to be impossible to implement on top of GORM's abstraction layer. I could not spend much time on this since June and don't think that I will be able to contribute a sufficiently robust implementation in the foreseeable future, so this issue can be closed if no one else is interested in this.

@aergus-tng commented on GitHub (Aug 26, 2025): A brief (non-)update: The potential solution I found (starting all transactions that can possibly write to the database as `IMMEDIATE`) appears to be impossible to implement on top of GORM's abstraction layer. I could not spend much time on this since June and don't think that I will be able to contribute a sufficiently robust implementation in the foreseeable future, so this issue can be closed if no one else is interested in this.
Author
Owner

@kradalby commented on GitHub (Sep 9, 2025):

I've merged #2670, which moves a lot of the heavy read from the database to an in-memory store (for nodes), as mentioned in #2598, we might have moved the bottleneck a bit around, but probably in the right direction.

What do you think about closing this in favour of a "touch the database less" approach?

@kradalby commented on GitHub (Sep 9, 2025): I've merged #2670, which moves a lot of the heavy read from the database to an in-memory store (for nodes), as mentioned in #2598, we might have moved the bottleneck a bit around, but probably in the right direction. What do you think about closing this in favour of a "touch the database less" approach?
Author
Owner

@aergus-tng commented on GitHub (Sep 15, 2025):

Thanks for implementing an in-memory store, that will surely reduce the DB load. As mentioned before, I'm fine with closing this issue.

@aergus-tng commented on GitHub (Sep 15, 2025): Thanks for implementing an in-memory store, that will surely reduce the DB load. As mentioned before, I'm fine with closing this issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#1011