[PR #3767] [MERGED] Book query optimizations #4075

Closed
opened 2026-04-25 00:18:13 +02:00 by adam · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/advplyr/audiobookshelf/pull/3767
Author: @mikiher
Created: 1/1/2025
Status: Merged
Merged: 1/1/2025
Merged by: @advplyr

Base: masterHead: book-query-optimizations


📝 Commits (3)

📊 Changes

4 files changed (+90 additions, -3 deletions)

View changed files

📝 server/migrations/changelog.md (+1 -0)
server/migrations/v2.17.7-add-indices.js (+83 -0)
📝 server/models/Book.js (+3 -3)
📝 server/models/LibraryItem.js (+3 -0)

📄 Description

Brief summary

This adds indices to significantly improve database library query time when sorting by size or duration.

Which issue is fixed?

There's no issue. I encountered this while working on PR #3726.

In-depth Description

The following query in libraryItemsBookFilters.js is where most of books fetch time is spent and its performance heavily depends on sortOrder. When fast scrolling on the library page happens, this can lead to choking on the server if a number of page requests are handled concurrently.

    const { rows: books, count } = await Database.bookModel.findAndCountAll({
      where: bookWhere,
      distinct: true,
      attributes: bookAttributes,
      replacements,
      include: [
        {
          model: Database.libraryItemModel,
          required: true,
          where: libraryItemWhere,
          include: libraryItemIncludes
        },
        seriesInclude,
        authorInclude,
        ...bookIncludes
      ],
      order: sortOrder,
      subQuery: false,
      limit: limit || null,
      offset
    })

I fixed a couple of the easy cases by introducing indices on size and duration.
This leads to a significant improvement in the query performance when paging through the library with sortBy size or duration.

I only dealt with size and duration for now because they were the easiest. Other sort orders are either already covered by existing indices, or they're trickier to optimize because of the existing database schema (e.g. sorting by author). Maybe I'll try to improve them in a future PR.

How have you tested this?

I tested by running 35 consecutive page requests, by scrolling down the library page. There was no parallel fetching.

Sorting by size

Current:

Histogram {
  min: 78,
  max: 1048,
  mean: 588.3714285714286,
  exceeds: 0,
  stddev: 305.3454330252669,
  count: 35,
  percentiles: SafeMap(8) [Map] {
    0 => 78,
    50 => 549,
    75 => 870,
    87.5 => 983,
    93.75 => 1019,
    96.875 => 1023,
    98.4375 => 1048,
    100 => 1048
  }
}

After adding index on (library_id, media_type, size) on table libraryItems:

Histogram {
  min: 37,
  max: 70,
  mean: 46.82857142857143,
  exceeds: 0,
  stddev: 5.940109254818776,
  count: 35,
  percentiles: SafeMap(8) [Map] {
    0 => 37,
    50 => 46,
    75 => 49,
    87.5 => 53,
    93.75 => 55,
    96.875 => 55,
    98.4375 => 70,
    100 => 70
  }
}

Sorting by duration

Current:

histogram: Histogram {
  min: 72,
  max: 258,
  mean: 162.82857142857142,
  exceeds: 0,
  stddev: 46.34497088714818,
  count: 35,
  percentiles: SafeMap(8) [Map] {
    0 => 72,
    50 => 154,
    75 => 182,
    87.5 => 238,
    93.75 => 254,
    96.875 => 254,
    98.4375 => 258,
    100 => 258
  }
}

After adding index on duration on table books:

Histogram {
  min: 25,
  max: 61,
  mean: 30.542857142857144,
  exceeds: 0,
  stddev: 6.030128438067621,
  count: 35,
  percentiles: SafeMap(8) [Map] {
    0 => 25,
    50 => 29,
    75 => 32,
    87.5 => 34,
    93.75 => 37,
    96.875 => 37,
    98.4375 => 61,
    100 => 61
  }
}

As you can the indices improve performance significantly.

One thing to note here is that even with the added index, the query performance slowly degrades with increasing offset.
This is because the larger the offset, the more records need to be examined to satisfy the query (this is a known issue when working with offsets). There's a technique to deal with this (keyset pagination), but it is a bit more tricky to implement. If you think it's worth the effort, I'll look into it.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/advplyr/audiobookshelf/pull/3767 **Author:** [@mikiher](https://github.com/mikiher) **Created:** 1/1/2025 **Status:** ✅ Merged **Merged:** 1/1/2025 **Merged by:** [@advplyr](https://github.com/advplyr) **Base:** `master` ← **Head:** `book-query-optimizations` --- ### 📝 Commits (3) - [`754c121`](https://github.com/advplyr/audiobookshelf/commit/754c1211683b1587e3a9296e5778a193176d4f53) Add libraryItem size index - [`0444829`](https://github.com/advplyr/audiobookshelf/commit/0444829a9f99b1d42ec00ce228f8c22eaa116e05) Add index on duration - [`46247ec`](https://github.com/advplyr/audiobookshelf/commit/46247ecf7897af954c965d58ca25b37bda72f4da) Update migrations changelog ### 📊 Changes **4 files changed** (+90 additions, -3 deletions) <details> <summary>View changed files</summary> 📝 `server/migrations/changelog.md` (+1 -0) ➕ `server/migrations/v2.17.7-add-indices.js` (+83 -0) 📝 `server/models/Book.js` (+3 -3) 📝 `server/models/LibraryItem.js` (+3 -0) </details> ### 📄 Description ## Brief summary This adds indices to significantly improve database library query time when sorting by size or duration. ## Which issue is fixed? There's no issue. I encountered this while working on PR #3726. ## In-depth Description The following query in `libraryItemsBookFilters.js` is where most of books fetch time is spent and its performance heavily depends on `sortOrder`. When fast scrolling on the library page happens, this can lead to choking on the server if a number of page requests are handled concurrently. ```js const { rows: books, count } = await Database.bookModel.findAndCountAll({ where: bookWhere, distinct: true, attributes: bookAttributes, replacements, include: [ { model: Database.libraryItemModel, required: true, where: libraryItemWhere, include: libraryItemIncludes }, seriesInclude, authorInclude, ...bookIncludes ], order: sortOrder, subQuery: false, limit: limit || null, offset }) ``` I fixed a couple of the easy cases by introducing indices on size and duration. This leads to a significant improvement in the query performance when paging through the library with sortBy size or duration. I only dealt with size and duration for now because they were the easiest. Other sort orders are either already covered by existing indices, or they're trickier to optimize because of the existing database schema (e.g. sorting by author). Maybe I'll try to improve them in a future PR. ## How have you tested this? I tested by running 35 consecutive page requests, by scrolling down the library page. There was no parallel fetching. ### Sorting by size Current: ``` Histogram { min: 78, max: 1048, mean: 588.3714285714286, exceeds: 0, stddev: 305.3454330252669, count: 35, percentiles: SafeMap(8) [Map] { 0 => 78, 50 => 549, 75 => 870, 87.5 => 983, 93.75 => 1019, 96.875 => 1023, 98.4375 => 1048, 100 => 1048 } } ``` After adding index on `(library_id, media_type, size)` on table `libraryItems`: ``` Histogram { min: 37, max: 70, mean: 46.82857142857143, exceeds: 0, stddev: 5.940109254818776, count: 35, percentiles: SafeMap(8) [Map] { 0 => 37, 50 => 46, 75 => 49, 87.5 => 53, 93.75 => 55, 96.875 => 55, 98.4375 => 70, 100 => 70 } } ``` ### Sorting by duration Current: ``` histogram: Histogram { min: 72, max: 258, mean: 162.82857142857142, exceeds: 0, stddev: 46.34497088714818, count: 35, percentiles: SafeMap(8) [Map] { 0 => 72, 50 => 154, 75 => 182, 87.5 => 238, 93.75 => 254, 96.875 => 254, 98.4375 => 258, 100 => 258 } } ``` After adding index on `duration` on table `books`: ``` Histogram { min: 25, max: 61, mean: 30.542857142857144, exceeds: 0, stddev: 6.030128438067621, count: 35, percentiles: SafeMap(8) [Map] { 0 => 25, 50 => 29, 75 => 32, 87.5 => 34, 93.75 => 37, 96.875 => 37, 98.4375 => 61, 100 => 61 } } ``` As you can the indices improve performance significantly. One thing to note here is that even with the added index, the query performance slowly degrades with increasing offset. This is because the larger the offset, the more records need to be examined to satisfy the query (this is a known issue when working with offsets). There's a technique to deal with this (keyset pagination), but it is a bit more tricky to implement. If you think it's worth the effort, I'll look into it. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
adam added the pull-request label 2026-04-25 00:18:13 +02:00
adam closed this issue 2026-04-25 00:18:13 +02:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/audiobookshelf#4075