[PR #3996] [MERGED] Improve podcast library page query performance on title, titleIgnorePrefix, and addedAt sort orders #4142

New Issue

2026-04-25T00:18:29+02:00

adam commented

2026-04-25 00:18:29 +02:00

📋 Pull Request Information

Original PR: https://github.com/advplyr/audiobookshelf/pull/3996
Author: @mikiher
Created: 2/16/2025
Status: ✅ Merged
Merged: 2/19/2025
Merged by: @advplyr

Base: master ← Head: optimize-podcast-queries

📝 Commits (10+)

23a7502 Add migration in preparation for podcast query optimization
e2f1aee Add numEpisodes to podcast model
7282afc Add podcastId to mediaProgress model
f1de307 Update cached user whenever mediaProgress is removed
da8fd2d Set podcastId when mediaProgress is created
f1e46a3 Separate feed query from podcasts page query
2e48ec0 Use libraryItem.title[IgnorePrefix] for sorting podcasts page query
707533d Remove numEpisodes subquery from podcasst page query
cb9fc3e Replace numEpisodesIncomplete subquery with cached user progress calculation
bd4f48e Add required: true to includes in podcast episodes page query

📊 Changes

13 files changed (+700 additions, -100 deletions)

View changed files

📝 server/Database.js (+4 -0)
📝 server/controllers/PodcastController.js (+7 -1)
📝 server/managers/PodcastManager.js (+8 -1)
📝 server/migrations/changelog.md (+1 -0)
➕ server/migrations/v2.19.4-improve-podcast-queries.js (+219 -0)
📝 server/models/MediaProgress.js (+14 -1)
📝 server/models/Podcast.js (+13 -1)
📝 server/models/PodcastEpisode.js (+9 -1)
📝 server/models/User.js (+11 -0)
📝 server/scanner/PodcastScanner.js (+86 -69)
📝 server/utils/queries/libraryFilters.js (+6 -3)
📝 server/utils/queries/libraryItemsPodcastFilters.js (+57 -23)
➕ test/server/migrations/v2.19.4-improve-podcast-queries.test.js (+265 -0)

📄 Description

Brief summary

This PR tries to optimize some of the podcast library page load and scrolling database queries, following up on what's been done for the book library in #3952

Which issue is fixed?

Fixes #3965

In-depth Description

Podcast library page queries are more complex than book library page queries, because they aggregate data from podcast episodes, specifically numEpisodes (the number of episodes that the podcast has), and numEpisodesIncomplete (the number of episodes that the current user has not yet finished listening to).

Before this change, these two virtual columns were obtained via per-row subqueries, making the podcast queries very inefficient, leading to very high library page query latency on large podcast libraries. In addition, the queries suffered from the same main issue that caused high latency in the book library page queries, namely the separation between the libraryItems and podcasts` tables.

Resolution

The following changes affecting the main podcast library page query were made:

Similar changes to the ones made for books in #3952 were implemented for podcasts
- Copied the title/titleIgnorePrefix columns to libraryItems
- Set the title/titleIgnorePrefix when a podcast library item is created
- Added triggers to update those when the original columns change.
- Separated the feed query
- Added a count cache in libraryItemsPodcastFilters
- Added environment-variable controlled profilling
Added a numEpisodes column to the podcasts table
- Populated during database migration
- Kept updated as podcast episodes are added and removed
- The per-row subquery for numEpisodes could now be removed
Added a podcastId column to the mediaProgresses table
- Populated during database migration
- Set when the mediaProgress record is created
- The per-row subquery for numEpisodesIncomplete could now be removed
  - Instead, numEpisodesComplete is calculated in-memory using the cached user record
  - numEpisodesIncomplete = numEpisodes - numEpisodesComplete
- The cached user record is updated whenever mediaProgress is created or destroyed

In addition, a couple of other small changes were implemented:

Added environment-variable controlled profilling for podcast episodes queries (used mainly in the podcast library homepage)
Added an ANALYZE database query at database initialization
- This query updates statistics about the database tables, indices, and their sizes
- It's recommended to run it mostly after large bulk updates and schema changes
- Usually takes no more than a second to run, so opted to run it during server startup.
  - After scanning a very large podcast library, running it caused the query plan for some of the database queries to change, improving those queries significantly

How have you tested this?

Performance

The issues described above tend to be more noticable in large podcast libraries, so I tested this on a synthetic podcast library containing 1000 podcast and 130,000 podcast episodes. Like in the previous PR, I ran tests on a Synology 920+ NAS, which has a relatively weak hardware.

No non-default pragma values were applied.

I tested on the same sort orders as in #3952

title
titleIgnorePrefix
addedAt

Other sort orders were not optimized.

I'll post more detailed results later, but the overall effect on latency is very large:

When podcast library page is sorted by title:
- Before this PR (edge image) - the first few 35-podcast-per-page page took 15-20 seconds per page to load. the next few took more than 2 minutes per-page.
- After the change - First 35-podcast-per-page page took ~150 milliseconds to load. the next few pages took between 50-150 milliseconds.

This is an overall drop of ~99% or more in podcast library page load time (roughly similar to what we see in #3952).

Correctness

Migration unit tested and tested on a real database
Played a lot with podcast, podcast episode, and media progress addition and removal
- Made sure the new column values are updated correctly
- Made sure count cache is invalidated as expected
- Made sure numPodcasts and numPodcastsIncomplete are properly calculated/updaed.

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/advplyr/audiobookshelf/pull/3996 **Author:** [@mikiher](https://github.com/mikiher) **Created:** 2/16/2025 **Status:** ✅ Merged **Merged:** 2/19/2025 **Merged by:** [@advplyr](https://github.com/advplyr) **Base:** `master` ← **Head:** `optimize-podcast-queries` --- ### 📝 Commits (10+) - [`23a7502`](https://github.com/advplyr/audiobookshelf/commit/23a750214fcd0927d9bbbde52dcb4f17d040455d) Add migration in preparation for podcast query optimization - [`e2f1aee`](https://github.com/advplyr/audiobookshelf/commit/e2f1aeed757967ab4b068bf6486a225e30f37747) Add numEpisodes to podcast model - [`7282afc`](https://github.com/advplyr/audiobookshelf/commit/7282afcfded6d2adb34bf25fc5001f4a11c7acb8) Add podcastId to mediaProgress model - [`f1de307`](https://github.com/advplyr/audiobookshelf/commit/f1de307bf9455d7209ef9c68282f35bccdf870c2) Update cached user whenever mediaProgress is removed - [`da8fd2d`](https://github.com/advplyr/audiobookshelf/commit/da8fd2d9d5ceace0c3ba76e985aadbb58730615c) Set podcastId when mediaProgress is created - [`f1e46a3`](https://github.com/advplyr/audiobookshelf/commit/f1e46a351bc6a2fb062b087406cf39f2fa223979) Separate feed query from podcasts page query - [`2e48ec0`](https://github.com/advplyr/audiobookshelf/commit/2e48ec0dde68b2b6e08896cdc3d456e247c96cac) Use libraryItem.title[IgnorePrefix] for sorting podcasts page query - [`707533d`](https://github.com/advplyr/audiobookshelf/commit/707533df8f09979f4ea215698ed4101fb4ca3a9e) Remove numEpisodes subquery from podcasst page query - [`cb9fc3e`](https://github.com/advplyr/audiobookshelf/commit/cb9fc3e0d141efec4e7b3823fffd0247e62a349f) Replace numEpisodesIncomplete subquery with cached user progress calculation - [`bd4f48e`](https://github.com/advplyr/audiobookshelf/commit/bd4f48ec3944faa88783dd2c03ebc157f766652e) Add required: true to includes in podcast episodes page query ### 📊 Changes **13 files changed** (+700 additions, -100 deletions) <details> <summary>View changed files</summary> 📝 `server/Database.js` (+4 -0) 📝 `server/controllers/PodcastController.js` (+7 -1) 📝 `server/managers/PodcastManager.js` (+8 -1) 📝 `server/migrations/changelog.md` (+1 -0) ➕ `server/migrations/v2.19.4-improve-podcast-queries.js` (+219 -0) 📝 `server/models/MediaProgress.js` (+14 -1) 📝 `server/models/Podcast.js` (+13 -1) 📝 `server/models/PodcastEpisode.js` (+9 -1) 📝 `server/models/User.js` (+11 -0) 📝 `server/scanner/PodcastScanner.js` (+86 -69) 📝 `server/utils/queries/libraryFilters.js` (+6 -3) 📝 `server/utils/queries/libraryItemsPodcastFilters.js` (+57 -23) ➕ `test/server/migrations/v2.19.4-improve-podcast-queries.test.js` (+265 -0) </details> ### 📄 Description ## Brief summary This PR tries to optimize some of the podcast library page load and scrolling database queries, following up on what's been done for the book library in #3952 ## Which issue is fixed? Fixes #3965 ## In-depth Description Podcast library page queries are more complex than book library page queries, because they aggregate data from podcast episodes, specifically `numEpisodes` (the number of episodes that the podcast has), and `numEpisodesIncomplete` (the number of episodes that the current user has not yet finished listening to). Before this change, these two virtual columns were obtained via per-row subqueries, making the podcast queries very inefficient, leading to very high library page query latency on large podcast libraries. In addition, the queries suffered from the same main issue that caused high latency in the book library page queries, namely the separation between the `libraryItems and `podcasts` tables. ### Resolution The following changes affecting the main podcast library page query were made: - Similar changes to the ones made for books in #3952 were implemented for podcasts - Copied the title/titleIgnorePrefix columns to `libraryItems` - Set the title/titleIgnorePrefix when a podcast library item is created - Added triggers to update those when the original columns change. - Separated the feed query - Added a count cache in `libraryItemsPodcastFilters` - Added environment-variable controlled profilling - Added a `numEpisodes` column to the `podcasts` table - Populated during database migration - Kept updated as podcast episodes are added and removed - The per-row subquery for `numEpisodes` could now be removed - Added a `podcastId` column to the `mediaProgresses` table - Populated during database migration - Set when the mediaProgress record is created - The per-row subquery for `numEpisodesIncomplete` could now be removed - Instead, `numEpisodesComplete` is calculated in-memory using the cached user record - `numEpisodesIncomplete = numEpisodes - numEpisodesComplete` - The cached user record is updated whenever mediaProgress is created or destroyed In addition, a couple of other small changes were implemented: - Added environment-variable controlled profilling for podcast episodes queries (used mainly in the podcast library homepage) - Added an `ANALYZE` database query at database initialization - This query updates statistics about the database tables, indices, and their sizes - It's recommended to run it mostly after large bulk updates and schema changes - Usually takes no more than a second to run, so opted to run it during server startup. - After scanning a very large podcast library, running it caused the query plan for some of the database queries to change, improving those queries significantly ## How have you tested this? ### Performance The issues described above tend to be more noticable in large podcast libraries, so I tested this on a synthetic podcast library containing 1000 podcast and 130,000 podcast episodes. Like in the previous PR, I ran tests on a Synology 920+ NAS, which has a relatively weak hardware. No non-default pragma values were applied. I tested on the same sort orders as in #3952 - title - titleIgnorePrefix - addedAt Other sort orders were not optimized. I'll post more detailed results later, but the overall effect on latency is very large: - When podcast library page is sorted by title: - Before this PR (edge image) - the first few 35-podcast-per-page page took 15-20 seconds per page to load. the next few took more than 2 minutes per-page. - After the change - First 35-podcast-per-page page took ~150 milliseconds to load. the next few pages took between 50-150 milliseconds. This is an overall drop of ~99% or more in podcast library page load time (roughly similar to what we see in #3952). ### Correctness - Migration unit tested and tested on a real database - Played a lot with podcast, podcast episode, and media progress addition and removal - Made sure the new column values are updated correctly - Made sure count cache is invalidated as expected - Made sure `numPodcasts` and `numPodcastsIncomplete` are properly calculated/updaed. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

adam added the pull-request label 2026-04-25 00:18:29 +02:00

adam closed this issue

2026-04-25 00:18:29 +02:00

Sign in to join this conversation.

Branches Tags

master

book_tags_genres_dedupe

episode_download_fallback

Issue-4540-SortBy-StartedDate-and-FinishedDate

episode_meta_tagging

fix_authorize_race_condition

redirect_transcode_requests

progress_updated_sort

fix_ereader_socket_event

fix_change_empty_root_password

fix_podcast_session_track_index

fix_set_token

session_modal_user

localize_durations

fix_oidc_create_user

jwt_auth_refactor

fix_scanner_deleting_single_file_books

fix_mediaprogress_updatedat_2

experimental_next_client

podcast_episode_duration

episode-timestamps-clickable

book_author_secondary_sort_title

podcast_useragents

pathexists_user_access

fix_pathexists_join

book_author_secondary_sort

clean_duplicate_mediaprogress

sanitize_html_description

trix_prevent_attachments

check_path_api_fix

fix_mediaprogress_updatedat

increase_express_json_limit

fix_dockerfile_nunicode

search_episodes

audiobook_tools_update

episode_secondary_sorts

hls_stream_url_update

new_session_track_endpoint

audiobook_tools_enhancements

watcher_rescans_update

player_track_tooltip

fix_exclude_prefixes_crash

socket_item_events

fix_podcast_episode_scanner_promise

new_stats_controller

count_cache_for_userpermissions

parsing-opf-v3

validate_migration_files

fix-quick-match-all-crash

fix-chapter-end-sleep-timer

stringify_sequelize_query

remove-col-ambiguity

fix_next_prev_edit_description

details_trim_whitespace

fix_content_url_basepath

fix_logger_fatal

progress_bar_visibility

batch-edit-populate-map-details

feed_generator_updates

bookmark-modal-updates

migrate-library-item-in-scanner

migrate-new-library-items

migrate-podcasts-new-library-item-2

migrate-podcasts-new-library-item

fix-remove-episode-from-playlist

playback-session-use-new-library-item

refactor-library-item

fix-heatmap-caption

feed-episodes-upsert

share-media-player-media-session-api

remove-old-playlist

remove_old_collection_object

plugin-implementation-demo

feed_migration

refactor-feeds-from-item

fix_remove_authors_no_books

v2.17.3-fk-constraints-migration

migrations-first-upgrade

sqlite_2

feature/nuxt-target-server

waveform

sqlite

playlists

video

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/audiobookshelf#4142