[PR #2103] [MERGED] Scan for empty book series more efficiently #3655

Closed
opened 2026-04-25 00:16:31 +02:00 by adam · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/advplyr/audiobookshelf/pull/2103
Author: @selfhost-alt
Created: 9/15/2023
Status: Merged
Merged: 9/17/2023
Merged by: @advplyr

Base: masterHead: faster-scan-for-empty-series


📝 Commits (4)

  • 8b39b01 Scan for empty book series more efficiently
  • 19cf3bf Fix query to actually return empty series
  • 3ad4f05 Merge branch 'master' into faster-scan-for-empty-series
  • 87eaace Fix empty podcast and empty book queries when cleaning db on init

📊 Changes

1 file changed (+15 additions, -3 deletions)

View changed files

📝 server/Database.js (+15 -3)

📄 Description

I've been trying to figure out why my server was starting so slowly after the Sqlite migration, and I found that the scan for empty book series seems to be the place where it spends the most time during startup.

I believe this slowness is due to the fact that it does a sub-query in the WHERE clause, which gets pretty inefficient when you have a large series table (as I do).

This is the original query that gets run before my change:

SELECT `id`, `name`, `nameIgnorePrefix`, `description`, `createdAt`, `updatedAt`, `libraryId`
FROM `series` AS `series`
WHERE (SELECT count(*) FROM bookSeries bs WHERE bs.seriesId = series.id) = 0;

On my dev container, against a snapshot of my DB (10k series entries, 36k bookSeries entries), this takes just under 10 minutes to complete.

With this PR, the query changes to the following:

SELECT
  `series`.`id`,
  `series`.`name`,
  `series`.`nameIgnorePrefix`,
  `series`.`description`,
  `series`.`createdAt`,
  `series`.`updatedAt`,
  `series`.`libraryId`,
  `bookSeries`.`id` AS `bookSeries.id`,
  `bookSeries`.`sequence` AS `bookSeries.sequence`,
  `bookSeries`.`createdAt` AS `bookSeries.createdAt`,
  `bookSeries`.`bookId` AS `bookSeries.bookId`,
  `bookSeries`.`seriesId` AS `bookSeries.seriesId`
FROM `series` AS `series`
LEFT OUTER JOIN `bookSeries` AS `bookSeries`
ON `series`.`id` = `bookSeries`.`seriesId`
WHERE `bookSeries`.`id` IS NULL;

On my dev container against the same DB snapshot, this query takes about 300ms.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/advplyr/audiobookshelf/pull/2103 **Author:** [@selfhost-alt](https://github.com/selfhost-alt) **Created:** 9/15/2023 **Status:** ✅ Merged **Merged:** 9/17/2023 **Merged by:** [@advplyr](https://github.com/advplyr) **Base:** `master` ← **Head:** `faster-scan-for-empty-series` --- ### 📝 Commits (4) - [`8b39b01`](https://github.com/advplyr/audiobookshelf/commit/8b39b01269dd29b0fcb19976ceaa0ce2bac9e7ca) Scan for empty book series more efficiently - [`19cf3bf`](https://github.com/advplyr/audiobookshelf/commit/19cf3bfb9f6c22df0095c65eacdd2d2f2d9f4778) Fix query to actually return empty series - [`3ad4f05`](https://github.com/advplyr/audiobookshelf/commit/3ad4f05449e628e7df1d65d836ad6bf339349302) Merge branch 'master' into faster-scan-for-empty-series - [`87eaace`](https://github.com/advplyr/audiobookshelf/commit/87eaacea220e43df03cf02d0a5b8a870e509a7ae) Fix empty podcast and empty book queries when cleaning db on init ### 📊 Changes **1 file changed** (+15 additions, -3 deletions) <details> <summary>View changed files</summary> 📝 `server/Database.js` (+15 -3) </details> ### 📄 Description I've been trying to figure out why my server was starting so slowly after the Sqlite migration, and I found that the scan for empty book series seems to be the place where it spends the most time during startup. I believe this slowness is due to the fact that it does a sub-query in the `WHERE` clause, which gets pretty inefficient when you have a large `series` table (as I do). This is the original query that gets run before my change: ```sql SELECT `id`, `name`, `nameIgnorePrefix`, `description`, `createdAt`, `updatedAt`, `libraryId` FROM `series` AS `series` WHERE (SELECT count(*) FROM bookSeries bs WHERE bs.seriesId = series.id) = 0; ``` On my dev container, against a snapshot of my DB (10k `series` entries, 36k `bookSeries` entries), this takes just under 10 minutes to complete. With this PR, the query changes to the following: ```sql SELECT `series`.`id`, `series`.`name`, `series`.`nameIgnorePrefix`, `series`.`description`, `series`.`createdAt`, `series`.`updatedAt`, `series`.`libraryId`, `bookSeries`.`id` AS `bookSeries.id`, `bookSeries`.`sequence` AS `bookSeries.sequence`, `bookSeries`.`createdAt` AS `bookSeries.createdAt`, `bookSeries`.`bookId` AS `bookSeries.bookId`, `bookSeries`.`seriesId` AS `bookSeries.seriesId` FROM `series` AS `series` LEFT OUTER JOIN `bookSeries` AS `bookSeries` ON `series`.`id` = `bookSeries`.`seriesId` WHERE `bookSeries`.`id` IS NULL; ``` On my dev container against the same DB snapshot, this query takes about 300ms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
adam added the pull-request label 2026-04-25 00:16:31 +02:00
adam closed this issue 2026-04-25 00:16:31 +02:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/audiobookshelf#3655