[PR #3594] [MERGED] Increase cache time for filterdata in library #4027

Closed
opened 2026-04-25 00:18:01 +02:00 by adam · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/advplyr/audiobookshelf/pull/3594
Author: @nichwall
Created: 11/8/2024
Status: Merged
Merged: 11/16/2024
Merged by: @advplyr

Base: masterHead: filter_data_longer_cache


📝 Commits (3)

  • 435b7fd Add: check for changes to library items
  • e57d4cc Add: filter update check to podcast libraries
  • e8d8b67 Add: check for deleted items

📊 Changes

1 file changed (+141 additions, -0 deletions)

View changed files

📝 server/utils/queries/libraryFilters.js (+141 -0)

📄 Description

This PR adds some additional logic to the generation of filterData to check if any changes have occurred to the database since the data was cached to reduce the cold-start time of requesting library information. The current behavior is to cache the filter data for 30 minutes, and then clear the cache for filterData regardless of database state. Ideally, we will need to update the data model to generate filterData more efficiently, but this should account for most "normal" usage of browsing/using the server without needing large changes to the data model.

This issue is more pronounced with large libraries, such as mentioned in https://github.com/advplyr/audiobookshelf/issues/3525#issuecomment-2442787507, but is still present in smaller libraries (a cold start on my ~500 book test server takes about ~2.3 seconds to generate).

I experimented with using individual queries, such as unique queries on the narrators column instead of having a loop in JS, and while each individual query was slightly faster than the loop, having 7 individual queries was much slower than just loading all of the data and iterating over it in javascript. The tags and genres were also slower when iterating over individual JSON elements in the SQL query, but it is possible to do directly using a query.

In the end, I determined that during normal usage, this filterdata doesn't really change. The main reason there is a 30 minute timeout is so that when metadata changes, the filter list in the library view is updated in a reasonable amount of time.
image

To simplify this, I added a check to get any rows in the relevant tables which have updated since the cache was created. If no results are returned, then the cache creation time is updated to validate the cache for an additional 30 minutes, and if any of the tables had a more recent updatedAt, then we regenerate the filterData as was already done.

Further work before it's ready:

This does not handle deleted items due to not having a row to check the updatedAt column.

The solutions I am investigating:

  • Add a migration for these tables to add the paranoid attribute , then clean up any soft-deleted rows when the filterdata is checked. This will likely complicate other logic due to the soft-deleted rows still showing up in raw queries.
  • Add a migration to the library table to add entityLastUpdated columns or similar to show the last time something was updated/deleted. This is a bit more brittle and requires updating another table whenever a change is made.
  • Adding more fields to the filterData array to store the count of books, authors, series, and podcasts for the library, and then checking this value to ensure no items were deleted (most promising, local change only)

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/advplyr/audiobookshelf/pull/3594 **Author:** [@nichwall](https://github.com/nichwall) **Created:** 11/8/2024 **Status:** ✅ Merged **Merged:** 11/16/2024 **Merged by:** [@advplyr](https://github.com/advplyr) **Base:** `master` ← **Head:** `filter_data_longer_cache` --- ### 📝 Commits (3) - [`435b7fd`](https://github.com/advplyr/audiobookshelf/commit/435b7fda7e8ddc3ef413a4e583ea09f91627d485) Add: check for changes to library items - [`e57d4cc`](https://github.com/advplyr/audiobookshelf/commit/e57d4cc54435dfa651a7de30a1a4b0a8fcd8d926) Add: filter update check to podcast libraries - [`e8d8b67`](https://github.com/advplyr/audiobookshelf/commit/e8d8b67c0aa170f5b0fe4fe8c5996a00b49bbdc0) Add: check for deleted items ### 📊 Changes **1 file changed** (+141 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `server/utils/queries/libraryFilters.js` (+141 -0) </details> ### 📄 Description This PR adds some additional logic to the generation of `filterData` to check if any changes have occurred to the database since the data was cached to reduce the cold-start time of requesting library information. The current behavior is to cache the filter data for 30 minutes, and then clear the cache for `filterData` regardless of database state. Ideally, we will need to update the data model to generate `filterData` more efficiently, but this should account for most "normal" usage of browsing/using the server without needing large changes to the data model. This issue is more pronounced with large libraries, such as mentioned in https://github.com/advplyr/audiobookshelf/issues/3525#issuecomment-2442787507, but is still present in smaller libraries (a cold start on my ~500 book test server takes about ~2.3 seconds to generate). I experimented with using individual queries, such as `unique` queries on the `narrators` column instead of having a loop in JS, and while each individual query was slightly faster than the loop, having 7 individual queries was much slower than just loading all of the data and iterating over it in javascript. The `tags` and `genres` were also slower when iterating over individual JSON elements in the SQL query, but it is possible to do directly using a query. In the end, I determined that during normal usage, this `filterdata` doesn't really change. The main reason there is a 30 minute timeout is so that when metadata changes, the filter list in the library view is updated in a reasonable amount of time. ![image](https://github.com/user-attachments/assets/2545c527-9456-4521-b7d7-039c95270d23) To simplify this, I added a check to get any rows in the relevant tables which have updated since the cache was created. If no results are returned, then the cache creation time is updated to validate the cache for an additional 30 minutes, and if any of the tables had a more recent `updatedAt`, then we regenerate the `filterData` as was already done. ## Further work before it's ready: This does *not* handle deleted items due to not having a row to check the `updatedAt` column. The solutions I am investigating: - Add a migration for these tables to add the [`paranoid` attribute ](https://sequelize.org/docs/v6/core-concepts/paranoid/), then clean up any soft-deleted rows when the filterdata is checked. This will likely complicate other logic due to the soft-deleted rows still showing up in raw queries. - Add a migration to the `library` table to add `entityLastUpdated` columns or similar to show the last time something was updated/deleted. This is a bit more brittle and requires updating another table whenever a change is made. - Adding more fields to the `filterData` array to store the count of books, authors, series, and podcasts for the library, and then checking this value to ensure no items were deleted (most promising, local change only) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
adam added the pull-request label 2026-04-25 00:18:01 +02:00
adam closed this issue 2026-04-25 00:18:01 +02:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/audiobookshelf#4027