[PR #3199] [MERGED] Support accent-insensitive search using SQLean unicode sqlite3 extension #3899

Closed
opened 2026-04-25 00:17:29 +02:00 by adam · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/advplyr/audiobookshelf/pull/3199
Author: @mikiher
Created: 7/27/2024
Status: Merged
Merged: 7/29/2024
Merged by: @advplyr

Base: masterHead: unaccent


📝 Commits (7)

  • 329e9c9 BinaryManager support for libraries and downloading from github release assets
  • 6c379fc Add /unicode* to .gitignore
  • dedf6e5 Support accent-insensitive matching using the sqlean sqlite3 unicode extension
  • c3f3fca Remove dependency on libs/ffbinaries from BinaryManager test
  • 3d2b2e4 Set execution permission for downloaded binaries
  • 2c453a3 Remove redundant console.log() message
  • 3ac604c Remove ffmpeg binaries install step from debian preinst script

📊 Changes

10 files changed (+662 additions, -688 deletions)

View changed files

📝 .gitignore (+2 -1)
📝 build/debian/DEBIAN/preinst (+0 -24)
📝 server/Database.js (+62 -0)
📝 server/Server.js (+2 -5)
server/libs/ffbinaries/index.js (+0 -315)
📝 server/managers/BinaryManager.js (+278 -92)
📝 server/utils/queries/authorFilters.js (+2 -4)
📝 server/utils/queries/libraryItemsBookFilters.js (+20 -21)
📝 server/utils/queries/libraryItemsPodcastFilters.js (+11 -14)
📝 test/server/managers/BinaryManager.test.js (+285 -212)

📄 Description

This fixes #2678.

This PR has two main parts:

1. Refactoring and modifications in BinaryManager to support new capabilities

  • Ability to download binaries from multiple sources.
  • Ability to download binaries from GitHub release assets
  • Support for downloading libraries

These BinaryManager changes allowed me to:

  • move away from downloading ffmpeg and ffprobe from ffbinaries.com, replacing it with the ffbinaries-prebuilt GitHub repository, which contains the same executables.
  • download the SQLean unicode library from the sqlean Github repository

I also made BinaryManager run for all platforms, so it's able to download the unicode library (which is needed for the second part). Up until now it was only running for dev and Windows platforms, but I believe it's safe to run it on all supported platforms, with no changes or side-effects (if ffmpeg and ffprobe already exist on the system and are identified using the findRequiredBinaries() method, those will be used and nothing will be downloaded).

As a reminder, if BinaryManager is not able to find a binary it needs, it downloads it and puts it in one of two locations:

  • mainInstallDir: gloabl.appRoot (or, in the case of a pkg-ed binary, in the directory where that binary is located)
  • altInstallDir: (if mainInstallDir is not writable) the Audiobookshelf config directory (which should always be writable)

I believe the BinaryManager is a good mechanism to make sure all required binaries are available (whether they're obtained externally or by BinaryManager itself), and it removes the hassle of deploying those binaries in the various supported platforms.

2. Support for accent-insensitive search by using functions from the SQLean unicode sqlite3 extension

  • The extension is downloaded by BinaryManager (like ffmpeg, it can also be added externally and passed though the SQLEAN_UNICODE_PATH environment variable)
  • It is loaded into sqlite3 at Database init time
  • During searches, the query is normalized by calling the functions above, and is matched against the fields that require accent-insensitive matching (and which are normalized in the same way).
  • It looks like using the normalized match doesn't affect query performance in any perceptible way - I benchmarked on a library containing ~4000 books and db query execution time seems to remain roughly unchanged (2-3 ms per query).

Note regarding pkg-ed binaries: the unicode extension cannot be packaged into a pkg-ed binary, since it loaded by the sqlite3 native code, which doesn't have access to the pkg virtual file system, so it has to be downloaded as a dependency for pkg-ed binaries as well.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/advplyr/audiobookshelf/pull/3199 **Author:** [@mikiher](https://github.com/mikiher) **Created:** 7/27/2024 **Status:** ✅ Merged **Merged:** 7/29/2024 **Merged by:** [@advplyr](https://github.com/advplyr) **Base:** `master` ← **Head:** `unaccent` --- ### 📝 Commits (7) - [`329e9c9`](https://github.com/advplyr/audiobookshelf/commit/329e9c9eb211ead325ae15c16e9b6d267d38297c) BinaryManager support for libraries and downloading from github release assets - [`6c379fc`](https://github.com/advplyr/audiobookshelf/commit/6c379fc3a7ba722c9439a5a7fa4cfad2a42a8aac) Add /unicode* to .gitignore - [`dedf6e5`](https://github.com/advplyr/audiobookshelf/commit/dedf6e5d4b721264048781f4c0a9c61be9cbeb66) Support accent-insensitive matching using the sqlean sqlite3 unicode extension - [`c3f3fca`](https://github.com/advplyr/audiobookshelf/commit/c3f3fca8965757192d2b793f835574ebf8c221a5) Remove dependency on libs/ffbinaries from BinaryManager test - [`3d2b2e4`](https://github.com/advplyr/audiobookshelf/commit/3d2b2e43b1201f74a4c2df808664f3f73d1d7a83) Set execution permission for downloaded binaries - [`2c453a3`](https://github.com/advplyr/audiobookshelf/commit/2c453a34ee6baa4e37e7e6b1045334555ea0d13f) Remove redundant console.log() message - [`3ac604c`](https://github.com/advplyr/audiobookshelf/commit/3ac604c6653beb23c87abad7bc2f21a68c55af6f) Remove ffmpeg binaries install step from debian preinst script ### 📊 Changes **10 files changed** (+662 additions, -688 deletions) <details> <summary>View changed files</summary> 📝 `.gitignore` (+2 -1) 📝 `build/debian/DEBIAN/preinst` (+0 -24) 📝 `server/Database.js` (+62 -0) 📝 `server/Server.js` (+2 -5) ➖ `server/libs/ffbinaries/index.js` (+0 -315) 📝 `server/managers/BinaryManager.js` (+278 -92) 📝 `server/utils/queries/authorFilters.js` (+2 -4) 📝 `server/utils/queries/libraryItemsBookFilters.js` (+20 -21) 📝 `server/utils/queries/libraryItemsPodcastFilters.js` (+11 -14) 📝 `test/server/managers/BinaryManager.test.js` (+285 -212) </details> ### 📄 Description This fixes #2678. This PR has two main parts: **1. Refactoring and modifications in BinaryManager to support new capabilities** - Ability to download binaries from multiple sources. - Ability to download binaries from GitHub release assets - Support for downloading libraries These BinaryManager changes allowed me to: - move away from downloading ffmpeg and ffprobe from ffbinaries.com, replacing it with the ffbinaries-prebuilt GitHub repository, which contains the same executables. - download the SQLean unicode library from the sqlean Github repository I also made BinaryManager run for all platforms, so it's able to download the unicode library (which is needed for the second part). Up until now it was only running for dev and Windows platforms, but I believe it's safe to run it on all supported platforms, with no changes or side-effects (if ffmpeg and ffprobe already exist on the system and are identified using the `findRequiredBinaries()` method, those will be used and nothing will be downloaded). As a reminder, if BinaryManager is not able to find a binary it needs, it downloads it and puts it in one of two locations: - `mainInstallDir`: `gloabl.appRoot` (or, in the case of a pkg-ed binary, in the directory where that binary is located) - `altInstallDir`: (if `mainInstallDir` is not writable) the Audiobookshelf config directory (which should always be writable) I believe the BinaryManager is a good mechanism to make sure all required binaries are available (whether they're obtained externally or by BinaryManager itself), and it removes the hassle of deploying those binaries in the various supported platforms. **2. Support for accent-insensitive search by using functions from the SQLean unicode sqlite3 extension** - The extension is downloaded by BinaryManager (like ffmpeg, it can also be added externally and passed though the `SQLEAN_UNICODE_PATH` environment variable) - It is loaded into sqlite3 at Database init time - During searches, the query is normalized by calling the functions above, and is matched against the fields that require accent-insensitive matching (and which are normalized in the same way). - It looks like using the normalized match doesn't affect query performance in any perceptible way - I benchmarked on a library containing ~4000 books and db query execution time seems to remain roughly unchanged (2-3 ms per query). _Note regarding pkg-ed binaries_: the unicode extension cannot be packaged into a pkg-ed binary, since it loaded by the sqlite3 native code, which doesn't have access to the pkg virtual file system, so it has to be downloaded as a dependency for pkg-ed binaries as well. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
adam added the pull-request label 2026-04-25 00:17:29 +02:00
adam closed this issue 2026-04-25 00:17:29 +02:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/audiobookshelf#3899