[PR #2099] [MERGED] Fuzzy Matching V1 #3651

Closed
opened 2026-04-25 00:16:30 +02:00 by adam · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/advplyr/audiobookshelf/pull/2099
Author: @mikiher
Created: 9/15/2023
Status: Merged
Merged: 9/22/2023
Merged by: @advplyr

Base: masterHead: Fuzzy-Matching


📝 Commits (4)

  • ac746f1 Fuzzy Matching V1
  • 67bbe21 Make quick-match more conservative
  • 81a9b8d Merge branch 'advplyr:master' into Fuzzy-Matching
  • 61c4860 Add jsdocs to BookFinder search functions

📊 Changes

2 files changed (+131 additions, -28 deletions)

View changed files

📝 server/finders/BookFinder.js (+130 -27)
📝 server/scanner/Scanner.js (+1 -1)

📄 Description

Problem:
Audiobookshelf requires a pretty strict folder structure. However, users sometimes have many books in existing folders that adhere to different (or no) standards, and they might be reluctant to fix their directory structure. But then book titles and authors are incorrectly read, and consequently, matching usually return no/wrong results, which requires users to manually fix the title and author before matching.

The option to prefer audio metadata over folder names somewhat improves the situation, but does not fix it, and is also not enabled by default.

Proposal:
As a first step, I'd like to suggest a heuristic fuzzy matching, that kicks in if the initial title and author search returns no results (a rudimentary version of this already exists in the code, potentially sending one additional search request with a "clean" version of the title and author - it is subsumed in the new proposal):

  • If the initial search returns no results, we first further clean the title, and then heuristically split it into hyphen-separated parts.
  • We then create a Set of title candidates, and add each part to the set. We also try to generate additional title candidates by applying various heuristics on each part, and add those candidates to the set as well.
  • The resulting list of unique candidates is then heuristically sorted to minimize the number of additional search requests while still keeping the request as specific as possible.
  • Additional search requests are then sent until one returns results, or until maxFuzzySearches (the maximum number of allowed additional search requests) has been reached
  • If no results were found, search requests are also repeated without the author (again, until maxFuzzySearches has been reached)

This proposal is implemented here.
I've evaluated it on 50 books that have audible.com metadata from my unmodified audiobook torrents directory, which has no standard folder structure. The existing matching finds the correct result only for 24% of books. Fuzzy matching V1 finds the correct result for 96% of books, and finds the correct result @1 for 92%. I have not calculated the average number of additional search requests, but it looks like it is usually between 0-3.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/advplyr/audiobookshelf/pull/2099 **Author:** [@mikiher](https://github.com/mikiher) **Created:** 9/15/2023 **Status:** ✅ Merged **Merged:** 9/22/2023 **Merged by:** [@advplyr](https://github.com/advplyr) **Base:** `master` ← **Head:** `Fuzzy-Matching` --- ### 📝 Commits (4) - [`ac746f1`](https://github.com/advplyr/audiobookshelf/commit/ac746f199b1fe51ba33cc70d9efe8bcdc6fea778) Fuzzy Matching V1 - [`67bbe21`](https://github.com/advplyr/audiobookshelf/commit/67bbe2151383803cc762991574c640cbefafd8f4) Make quick-match more conservative - [`81a9b8d`](https://github.com/advplyr/audiobookshelf/commit/81a9b8d158a2051e31cae46859ccee2d5a6b4831) Merge branch 'advplyr:master' into Fuzzy-Matching - [`61c4860`](https://github.com/advplyr/audiobookshelf/commit/61c48602e86abade6b186f0af6d2d9326f0f24c4) Add jsdocs to BookFinder search functions ### 📊 Changes **2 files changed** (+131 additions, -28 deletions) <details> <summary>View changed files</summary> 📝 `server/finders/BookFinder.js` (+130 -27) 📝 `server/scanner/Scanner.js` (+1 -1) </details> ### 📄 Description Problem: Audiobookshelf requires a pretty strict folder structure. However, users sometimes have many books in existing folders that adhere to different (or no) standards, and they might be reluctant to fix their directory structure. But then book titles and authors are incorrectly read, and consequently, matching usually return no/wrong results, which requires users to manually fix the title and author before matching. The option to prefer audio metadata over folder names somewhat improves the situation, but does not fix it, and is also not enabled by default. Proposal: As a first step, I'd like to suggest a heuristic fuzzy matching, that kicks in if the initial title and author search returns no results (a rudimentary version of this already exists in the code, potentially sending one additional search request with a "clean" version of the title and author - it is subsumed in the new proposal): - If the initial search returns no results, we first further clean the title, and then heuristically split it into hyphen-separated parts. - We then create a Set of title candidates, and add each part to the set. We also try to generate additional title candidates by applying various heuristics on each part, and add those candidates to the set as well. - The resulting list of unique candidates is then heuristically sorted to minimize the number of additional search requests while still keeping the request as specific as possible. - Additional search requests are then sent until one returns results, or until maxFuzzySearches (the maximum number of allowed additional search requests) has been reached - If no results were found, search requests are also repeated without the author (again, until maxFuzzySearches has been reached) This proposal is implemented here. I've evaluated it on 50 books that have audible.com metadata from my unmodified audiobook torrents directory, which has no standard folder structure. The existing matching finds the correct result only for 24% of books. Fuzzy matching V1 finds the correct result for 96% of books, and finds the correct result @1 for 92%. I have not calculated the average number of additional search requests, but it looks like it is usually between 0-3. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
adam added the pull-request label 2026-04-25 00:16:30 +02:00
adam closed this issue 2026-04-25 00:16:30 +02:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/audiobookshelf#3651