[PR #3468] [MERGED] Nunicode integration #3983

Closed
opened 2026-04-25 00:17:50 +02:00 by adam · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/advplyr/audiobookshelf/pull/3468
Author: @mikiher
Created: 9/29/2024
Status: Merged
Merged: 10/1/2024
Merged by: @advplyr

Base: masterHead: nunicode-intergration


📝 Commits (6)

📊 Changes

9 files changed (+365 additions, -100 deletions)

View changed files

📝 .gitignore (+1 -0)
📝 Dockerfile (+25 -9)
📝 index.js (+1 -0)
📝 server/Database.js (+70 -44)
📝 server/managers/BinaryManager.js (+147 -35)
📝 server/utils/queries/authorFilters.js (+2 -2)
📝 server/utils/queries/libraryItemsBookFilters.js (+6 -6)
📝 server/utils/queries/libraryItemsPodcastFilters.js (+5 -4)
📝 test/server/managers/BinaryManager.test.js (+108 -0)

📄 Description

This is my second attempt at integrating an SQLite extension that provides unicode aware case-folding and unaccenting, which fixes #2678 properly.

After numerous attempts and discussions with @devnoname120, I settled on Nunicode SQLite extension, which, compared to the previous (SQlean/unicode) extension I tried:

  • Does not override the SQLite NOCASE collation (which lead to crashes with the previous extension, because indices were not properly re-indexed).
  • Provides superior case-folding support which wasn't provided by the previous extension (e.g. Maße case-folded to masse)
  • Supports unicode-aware unaccent(), upper() and lower() functions, and a unicode aware case-insensitive LIKE, like the previous extension
  • Like the previous extension, Nunicode doesn't have any dependencies, and generates a very small library (~300Kb)

I set up a separate Github project nunicode-binaries, which builds the extension from source and allows to download the binaries for all the platforms Audiobookshelf server supports (linux-x64, linux-arm64, win-x64, and osx-arm64). The build and release are implemented as Github workflows.

This PR has the following changes:

  • BinaryManager now downloads the extension from nunicode-binaries (the download is done directly from the latest release page, not through the API, which has rate limitations).
  • BinaryManager now also saves version info for downloaded libraries (and also checks it against the versions defined as valid)
  • BinaryManager was also tightened against various failures.
  • DockerFile was modified to download and install the extension from nunicode-binaries
  • The extension is loaded at Database initialization
  • The extension has been defined as optional - if it is not found or cannot be downloaded for any reason, the server will start without it.
    • This is done in order to deal with any unexpected failures during the initial rollout. After this runs in prod for a while, I'd like to make it required, so we don't have different search behaviours.
  • Searches now use the following algorithm (slightly modified from the previous attempt)
    • If the search query does not contains accents, it is matched against an unaccented colum
      • unaccent(column) LIKE '%query%'
    • If the search query contains accents, it is matched against the original column
      • column LIKE '%qüery%'
        • this provides a way to match without accent normalization.
    • All searches are unicode-aware case-insensitive, due to the re-implementation of the LIKE operator

The extension integration was tested on Windows, and on linux-x64 and linux-arm64 Docker (and passes our integration test).


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/advplyr/audiobookshelf/pull/3468 **Author:** [@mikiher](https://github.com/mikiher) **Created:** 9/29/2024 **Status:** ✅ Merged **Merged:** 10/1/2024 **Merged by:** [@advplyr](https://github.com/advplyr) **Base:** `master` ← **Head:** `nunicode-intergration` --- ### 📝 Commits (6) - [`501dc93`](https://github.com/advplyr/audiobookshelf/commit/501dc938e6bad4e5342d375b9cdaa92c28faccaf) Add Nunicode sqlite extension integration - [`37eae34`](https://github.com/advplyr/audiobookshelf/commit/37eae3406c9bfec59e4ba8a4c5e1c8f6f038380a) Remove debug messages - [`7108501`](https://github.com/advplyr/audiobookshelf/commit/7108501d242a70662d4807dbf266632b0d6b8c2a) Add libnusqlite3 to gitignore - [`4a7ada2`](https://github.com/advplyr/audiobookshelf/commit/4a7ada28fb413366321fefed27aa2d7579217929) Switch to nunicode-binaries v1.1 - [`4e8b472`](https://github.com/advplyr/audiobookshelf/commit/4e8b4720a172b5e0d96421a6993097ac32d54f40) Merge branch 'nunicode-intergration' of https://github.com/mikiher/audiobookshelf into nunicode-intergration - [`0865326`](https://github.com/advplyr/audiobookshelf/commit/086532652eb20866617f1b6049bc182e15c525e8) Fix to NUSQLITE3_PATH in index.js ### 📊 Changes **9 files changed** (+365 additions, -100 deletions) <details> <summary>View changed files</summary> 📝 `.gitignore` (+1 -0) 📝 `Dockerfile` (+25 -9) 📝 `index.js` (+1 -0) 📝 `server/Database.js` (+70 -44) 📝 `server/managers/BinaryManager.js` (+147 -35) 📝 `server/utils/queries/authorFilters.js` (+2 -2) 📝 `server/utils/queries/libraryItemsBookFilters.js` (+6 -6) 📝 `server/utils/queries/libraryItemsPodcastFilters.js` (+5 -4) 📝 `test/server/managers/BinaryManager.test.js` (+108 -0) </details> ### 📄 Description This is my second attempt at integrating an SQLite extension that provides unicode aware case-folding and unaccenting, which fixes #2678 properly. After numerous attempts and discussions with @devnoname120, I settled on [Nunicode](https://bitbucket.org/alekseyt/nunicode/src/master/) SQLite extension, which, compared to the previous ([SQlean/unicode](https://github.com/nalgeon/sqlean/blob/main/docs/unicode.md)) extension I tried: - Does not override the SQLite NOCASE collation (which lead to crashes with the previous extension, because indices were not properly re-indexed). - Provides superior case-folding support which wasn't provided by the previous extension (e.g. `Maße` case-folded to `masse`) - Supports unicode-aware unaccent(), upper() and lower() functions, and a unicode aware case-insensitive LIKE, like the previous extension - Like the previous extension, Nunicode doesn't have any dependencies, and generates a very small library (~300Kb) I set up a separate Github project [nunicode-binaries](https://github.com/mikiher/nunicode-binaries), which builds the extension from source and allows to download the binaries for all the platforms Audiobookshelf server supports (linux-x64, linux-arm64, win-x64, and osx-arm64). The build and release are implemented as Github workflows. This PR has the following changes: - BinaryManager now downloads the extension from nunicode-binaries (the download is done directly from the latest release page, not through the API, which has rate limitations). - BinaryManager now also saves version info for downloaded libraries (and also checks it against the versions defined as valid) - BinaryManager was also tightened against various failures. - DockerFile was modified to download and install the extension from nunicode-binaries - The extension is loaded at Database initialization - The extension has been defined as _optional_ - if it is not found or cannot be downloaded for any reason, the server will start without it. - This is done in order to deal with any unexpected failures during the initial rollout. After this runs in prod for a while, I'd like to make it required, so we don't have different search behaviours. - Searches now use the following algorithm (slightly modified from the previous attempt) - If the search query does not contains accents, it is matched against an unaccented colum - `unaccent(column) LIKE '%query%'` - If the search query contains accents, it is matched against the original column - `column LIKE '%qüery%'` - this provides a way to match without accent normalization. - All searches are unicode-aware case-insensitive, due to the re-implementation of the LIKE operator The extension integration was tested on Windows, and on linux-x64 and linux-arm64 Docker (and passes our integration test). --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
adam added the pull-request label 2026-04-25 00:17:50 +02:00
adam closed this issue 2026-04-25 00:17:50 +02:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/audiobookshelf#3983