[PR #3996] [MERGED] Improve podcast library page query performance on title, titleIgnorePrefix, and addedAt sort orders #4142

Closed
opened 2026-04-25 00:18:29 +02:00 by adam · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/advplyr/audiobookshelf/pull/3996
Author: @mikiher
Created: 2/16/2025
Status: Merged
Merged: 2/19/2025
Merged by: @advplyr

Base: masterHead: optimize-podcast-queries


📝 Commits (10+)

  • 23a7502 Add migration in preparation for podcast query optimization
  • e2f1aee Add numEpisodes to podcast model
  • 7282afc Add podcastId to mediaProgress model
  • f1de307 Update cached user whenever mediaProgress is removed
  • da8fd2d Set podcastId when mediaProgress is created
  • f1e46a3 Separate feed query from podcasts page query
  • 2e48ec0 Use libraryItem.title[IgnorePrefix] for sorting podcasts page query
  • 707533d Remove numEpisodes subquery from podcasst page query
  • cb9fc3e Replace numEpisodesIncomplete subquery with cached user progress calculation
  • bd4f48e Add required: true to includes in podcast episodes page query

📊 Changes

13 files changed (+700 additions, -100 deletions)

View changed files

📝 server/Database.js (+4 -0)
📝 server/controllers/PodcastController.js (+7 -1)
📝 server/managers/PodcastManager.js (+8 -1)
📝 server/migrations/changelog.md (+1 -0)
server/migrations/v2.19.4-improve-podcast-queries.js (+219 -0)
📝 server/models/MediaProgress.js (+14 -1)
📝 server/models/Podcast.js (+13 -1)
📝 server/models/PodcastEpisode.js (+9 -1)
📝 server/models/User.js (+11 -0)
📝 server/scanner/PodcastScanner.js (+86 -69)
📝 server/utils/queries/libraryFilters.js (+6 -3)
📝 server/utils/queries/libraryItemsPodcastFilters.js (+57 -23)
test/server/migrations/v2.19.4-improve-podcast-queries.test.js (+265 -0)

📄 Description

Brief summary

This PR tries to optimize some of the podcast library page load and scrolling database queries, following up on what's been done for the book library in #3952

Which issue is fixed?

Fixes #3965

In-depth Description

Podcast library page queries are more complex than book library page queries, because they aggregate data from podcast episodes, specifically numEpisodes (the number of episodes that the podcast has), and numEpisodesIncomplete (the number of episodes that the current user has not yet finished listening to).

Before this change, these two virtual columns were obtained via per-row subqueries, making the podcast queries very inefficient, leading to very high library page query latency on large podcast libraries. In addition, the queries suffered from the same main issue that caused high latency in the book library page queries, namely the separation between the libraryItems and podcasts` tables.

Resolution

The following changes affecting the main podcast library page query were made:

  • Similar changes to the ones made for books in #3952 were implemented for podcasts
    • Copied the title/titleIgnorePrefix columns to libraryItems
    • Set the title/titleIgnorePrefix when a podcast library item is created
    • Added triggers to update those when the original columns change.
    • Separated the feed query
    • Added a count cache in libraryItemsPodcastFilters
    • Added environment-variable controlled profilling
  • Added a numEpisodes column to the podcasts table
    • Populated during database migration
    • Kept updated as podcast episodes are added and removed
    • The per-row subquery for numEpisodes could now be removed
  • Added a podcastId column to the mediaProgresses table
    • Populated during database migration
    • Set when the mediaProgress record is created
    • The per-row subquery for numEpisodesIncomplete could now be removed
      • Instead, numEpisodesComplete is calculated in-memory using the cached user record
      • numEpisodesIncomplete = numEpisodes - numEpisodesComplete
    • The cached user record is updated whenever mediaProgress is created or destroyed

In addition, a couple of other small changes were implemented:

  • Added environment-variable controlled profilling for podcast episodes queries (used mainly in the podcast library homepage)
  • Added an ANALYZE database query at database initialization
    • This query updates statistics about the database tables, indices, and their sizes
    • It's recommended to run it mostly after large bulk updates and schema changes
    • Usually takes no more than a second to run, so opted to run it during server startup.
      • After scanning a very large podcast library, running it caused the query plan for some of the database queries to change, improving those queries significantly

How have you tested this?

Performance

The issues described above tend to be more noticable in large podcast libraries, so I tested this on a synthetic podcast library containing 1000 podcast and 130,000 podcast episodes. Like in the previous PR, I ran tests on a Synology 920+ NAS, which has a relatively weak hardware.

No non-default pragma values were applied.

I tested on the same sort orders as in #3952

  • title
  • titleIgnorePrefix
  • addedAt

Other sort orders were not optimized.

I'll post more detailed results later, but the overall effect on latency is very large:

  • When podcast library page is sorted by title:
    • Before this PR (edge image) - the first few 35-podcast-per-page page took 15-20 seconds per page to load. the next few took more than 2 minutes per-page.
    • After the change - First 35-podcast-per-page page took ~150 milliseconds to load. the next few pages took between 50-150 milliseconds.

This is an overall drop of ~99% or more in podcast library page load time (roughly similar to what we see in #3952).

Correctness

  • Migration unit tested and tested on a real database
  • Played a lot with podcast, podcast episode, and media progress addition and removal
    • Made sure the new column values are updated correctly
    • Made sure count cache is invalidated as expected
    • Made sure numPodcasts and numPodcastsIncomplete are properly calculated/updaed.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/advplyr/audiobookshelf/pull/3996 **Author:** [@mikiher](https://github.com/mikiher) **Created:** 2/16/2025 **Status:** ✅ Merged **Merged:** 2/19/2025 **Merged by:** [@advplyr](https://github.com/advplyr) **Base:** `master` ← **Head:** `optimize-podcast-queries` --- ### 📝 Commits (10+) - [`23a7502`](https://github.com/advplyr/audiobookshelf/commit/23a750214fcd0927d9bbbde52dcb4f17d040455d) Add migration in preparation for podcast query optimization - [`e2f1aee`](https://github.com/advplyr/audiobookshelf/commit/e2f1aeed757967ab4b068bf6486a225e30f37747) Add numEpisodes to podcast model - [`7282afc`](https://github.com/advplyr/audiobookshelf/commit/7282afcfded6d2adb34bf25fc5001f4a11c7acb8) Add podcastId to mediaProgress model - [`f1de307`](https://github.com/advplyr/audiobookshelf/commit/f1de307bf9455d7209ef9c68282f35bccdf870c2) Update cached user whenever mediaProgress is removed - [`da8fd2d`](https://github.com/advplyr/audiobookshelf/commit/da8fd2d9d5ceace0c3ba76e985aadbb58730615c) Set podcastId when mediaProgress is created - [`f1e46a3`](https://github.com/advplyr/audiobookshelf/commit/f1e46a351bc6a2fb062b087406cf39f2fa223979) Separate feed query from podcasts page query - [`2e48ec0`](https://github.com/advplyr/audiobookshelf/commit/2e48ec0dde68b2b6e08896cdc3d456e247c96cac) Use libraryItem.title[IgnorePrefix] for sorting podcasts page query - [`707533d`](https://github.com/advplyr/audiobookshelf/commit/707533df8f09979f4ea215698ed4101fb4ca3a9e) Remove numEpisodes subquery from podcasst page query - [`cb9fc3e`](https://github.com/advplyr/audiobookshelf/commit/cb9fc3e0d141efec4e7b3823fffd0247e62a349f) Replace numEpisodesIncomplete subquery with cached user progress calculation - [`bd4f48e`](https://github.com/advplyr/audiobookshelf/commit/bd4f48ec3944faa88783dd2c03ebc157f766652e) Add required: true to includes in podcast episodes page query ### 📊 Changes **13 files changed** (+700 additions, -100 deletions) <details> <summary>View changed files</summary> 📝 `server/Database.js` (+4 -0) 📝 `server/controllers/PodcastController.js` (+7 -1) 📝 `server/managers/PodcastManager.js` (+8 -1) 📝 `server/migrations/changelog.md` (+1 -0) ➕ `server/migrations/v2.19.4-improve-podcast-queries.js` (+219 -0) 📝 `server/models/MediaProgress.js` (+14 -1) 📝 `server/models/Podcast.js` (+13 -1) 📝 `server/models/PodcastEpisode.js` (+9 -1) 📝 `server/models/User.js` (+11 -0) 📝 `server/scanner/PodcastScanner.js` (+86 -69) 📝 `server/utils/queries/libraryFilters.js` (+6 -3) 📝 `server/utils/queries/libraryItemsPodcastFilters.js` (+57 -23) ➕ `test/server/migrations/v2.19.4-improve-podcast-queries.test.js` (+265 -0) </details> ### 📄 Description ## Brief summary This PR tries to optimize some of the podcast library page load and scrolling database queries, following up on what's been done for the book library in #3952 ## Which issue is fixed? Fixes #3965 ## In-depth Description Podcast library page queries are more complex than book library page queries, because they aggregate data from podcast episodes, specifically `numEpisodes` (the number of episodes that the podcast has), and `numEpisodesIncomplete` (the number of episodes that the current user has not yet finished listening to). Before this change, these two virtual columns were obtained via per-row subqueries, making the podcast queries very inefficient, leading to very high library page query latency on large podcast libraries. In addition, the queries suffered from the same main issue that caused high latency in the book library page queries, namely the separation between the `libraryItems and `podcasts` tables. ### Resolution The following changes affecting the main podcast library page query were made: - Similar changes to the ones made for books in #3952 were implemented for podcasts - Copied the title/titleIgnorePrefix columns to `libraryItems` - Set the title/titleIgnorePrefix when a podcast library item is created - Added triggers to update those when the original columns change. - Separated the feed query - Added a count cache in `libraryItemsPodcastFilters` - Added environment-variable controlled profilling - Added a `numEpisodes` column to the `podcasts` table - Populated during database migration - Kept updated as podcast episodes are added and removed - The per-row subquery for `numEpisodes` could now be removed - Added a `podcastId` column to the `mediaProgresses` table - Populated during database migration - Set when the mediaProgress record is created - The per-row subquery for `numEpisodesIncomplete` could now be removed - Instead, `numEpisodesComplete` is calculated in-memory using the cached user record - `numEpisodesIncomplete = numEpisodes - numEpisodesComplete` - The cached user record is updated whenever mediaProgress is created or destroyed In addition, a couple of other small changes were implemented: - Added environment-variable controlled profilling for podcast episodes queries (used mainly in the podcast library homepage) - Added an `ANALYZE` database query at database initialization - This query updates statistics about the database tables, indices, and their sizes - It's recommended to run it mostly after large bulk updates and schema changes - Usually takes no more than a second to run, so opted to run it during server startup. - After scanning a very large podcast library, running it caused the query plan for some of the database queries to change, improving those queries significantly ## How have you tested this? ### Performance The issues described above tend to be more noticable in large podcast libraries, so I tested this on a synthetic podcast library containing 1000 podcast and 130,000 podcast episodes. Like in the previous PR, I ran tests on a Synology 920+ NAS, which has a relatively weak hardware. No non-default pragma values were applied. I tested on the same sort orders as in #3952 - title - titleIgnorePrefix - addedAt Other sort orders were not optimized. I'll post more detailed results later, but the overall effect on latency is very large: - When podcast library page is sorted by title: - Before this PR (edge image) - the first few 35-podcast-per-page page took 15-20 seconds per page to load. the next few took more than 2 minutes per-page. - After the change - First 35-podcast-per-page page took ~150 milliseconds to load. the next few pages took between 50-150 milliseconds. This is an overall drop of ~99% or more in podcast library page load time (roughly similar to what we see in #3952). ### Correctness - Migration unit tested and tested on a real database - Played a lot with podcast, podcast episode, and media progress addition and removal - Made sure the new column values are updated correctly - Made sure count cache is invalidated as expected - Made sure `numPodcasts` and `numPodcastsIncomplete` are properly calculated/updaed. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
adam added the pull-request label 2026-04-25 00:18:29 +02:00
adam closed this issue 2026-04-25 00:18:29 +02:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/audiobookshelf#4142