mirror of
https://github.com/advplyr/audiobookshelf.git
synced 2026-05-30 23:40:40 +02:00
Some books not matching #10
Closed
opened 2026-04-24 22:56:25 +02:00 by adam
·
11 comments
No Branch/Tag Specified
master
book_tags_genres_dedupe
episode_download_fallback
Issue-4540-SortBy-StartedDate-and-FinishedDate
episode_meta_tagging
fix_authorize_race_condition
redirect_transcode_requests
progress_updated_sort
fix_ereader_socket_event
fix_change_empty_root_password
fix_podcast_session_track_index
fix_set_token
session_modal_user
localize_durations
fix_oidc_create_user
jwt_auth_refactor
fix_scanner_deleting_single_file_books
fix_mediaprogress_updatedat_2
experimental_next_client
podcast_episode_duration
episode-timestamps-clickable
book_author_secondary_sort_title
podcast_useragents
pathexists_user_access
fix_pathexists_join
book_author_secondary_sort
clean_duplicate_mediaprogress
sanitize_html_description
trix_prevent_attachments
check_path_api_fix
fix_mediaprogress_updatedat
increase_express_json_limit
fix_dockerfile_nunicode
search_episodes
audiobook_tools_update
episode_secondary_sorts
hls_stream_url_update
new_session_track_endpoint
audiobook_tools_enhancements
watcher_rescans_update
player_track_tooltip
fix_exclude_prefixes_crash
socket_item_events
fix_podcast_episode_scanner_promise
new_stats_controller
count_cache_for_userpermissions
parsing-opf-v3
validate_migration_files
fix-quick-match-all-crash
fix-chapter-end-sleep-timer
stringify_sequelize_query
remove-col-ambiguity
fix_next_prev_edit_description
details_trim_whitespace
fix_content_url_basepath
fix_logger_fatal
progress_bar_visibility
batch-edit-populate-map-details
feed_generator_updates
bookmark-modal-updates
migrate-library-item-in-scanner
migrate-new-library-items
migrate-podcasts-new-library-item-2
migrate-podcasts-new-library-item
fix-remove-episode-from-playlist
playback-session-use-new-library-item
refactor-library-item
fix-heatmap-caption
feed-episodes-upsert
share-media-player-media-session-api
remove-old-playlist
remove_old_collection_object
plugin-implementation-demo
feed_migration
refactor-feeds-from-item
fix_remove_authors_no_books
v2.17.3-fk-constraints-migration
migrations-first-upgrade
sqlite_2
feature/nuxt-target-server
waveform
sqlite
playlists
video
v2.35.1
v2.35.0
v2.34.0
v2.33.2
v2.33.1
v2.33.0
v2.32.1
v2.32.0
v2.31.0
v2.30.0
v2.29.0
v2.28.0
v2.27.0
v2.26.3
v2.26.2
v2.26.1
v2.26.0
v2.25.1
v2.25.0
v2.24.0
v2.23.0
v2.22.0
v2.21.0
v2.20.0
v2.19.5
v2.19.4
v2.19.3
v2.19.2
v2.19.1
v2.19.0
v2.18.1
v2.18.0
v2.17.7
v2.17.6
v2.17.5
v2.17.4
v2.17.3
v2.17.2
v2.17.1
v2.17.0
v2.16.2
v2.16.1
v2.16.0
v2.15.1
v2.15.0
v2.14.0
v2.13.4
v2.13.3
v2.13.2
v2.13.1
v2.13.0
v2.12.3
v2.12.2
v2.12.1
v2.12.0
v2.11.0
v2.10.1
v2.10.0
v2.9.0
v2.8.1
v2.8.0
v2.7.2
v2.7.1
v2.7.0
v2.6.0
v2.5.0
v2.4.4
v2.4.3
v2.4.2
v2.4.1
v2.4.0
v2.3.5
v2.3.4
v2.3.3
v2.3.2
v2.3.1
v2.3.0
v2.2.23
v2.2.22
v2.2.21
v2.2.20
v2.2.19
v2.2.18
v2.2.17
v2.2.16
v2.2.15
v2.2.14
v2.2.13
v2.2.12
v2.2.11
v2.2.10
v2.2.9
v2.2.8
v2.2.7
v2.2.6
v2.2.5
v2.2.4
v2.2.3
v2.2.2
v2.2.1
v2.2.0
v2.1.5
v2.1.4
v2.1.3
v2.1.2
v2.1.1
v2.1.0
v2.0.24
v2.0.23
v2.0.22
v2.0.21
v2.0.20
v2.0.19
v2.0.18
v2.0.17
v2.0.16
v2.0.15
v2.0.14
v2.0.13
v2.0.12
v2.0.11
v2.0.10
v2.0.9
v2.0.8
v2.0.7
v2.0.6
v2.0.5
v2.0.4
v2.0.3
v2.0.2
v2.0.1
v1.7.2
v1.7.1
v1.7.0
v1.6.0
v1.5.5
v1.5.0
v1.4.11
v1.4.9
v1.4.7
v1.4.6
v1.4.4
v1.4.2
v1.4.0
v1.4.1
v1.3.4
v1.3.3
v1.3.1
v1.2.8
v1.2.6
v1.2.5
v1.2.4
v1.2.1
v1.1.15
v1.1.14
v1.1.13
v1.1.12
v1.1.11
v1.1.10
v1.1.9
v1.1.8
v1.0.0
0.9.61-beta.0
0.9.61-beta
Labels
Clear labels
authentication
backlog
bug
chapter editor
config-issue
ebooks
encoding/embedding
enhancement
help wanted
listening sessions & progress
planned
possible plugin
progress sync
pull-request
sorting/filtering/searching
unable to reproduce
upload
users & permissions
waiting
Mirrored from GitHub Pull Request
No Label
Milestone
No items
No Milestone
Projects
Clear projects
No project
Assignees
adam (Adam Melkus)
Clear assignees
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: starred/audiobookshelf#10
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Merijeek on GitHub (Aug 28, 2021).
So...I've got some books not matching. I'm guessing the title is off by a tiny bit.
What is the matching source? It would be good to know so that I can look them up manually and rename appropriately so that matching works correctly.
@advplyr commented on GitHub (Aug 29, 2021):
The BookFinder is weak right now. There are just not many good options.
First it checks Open Library
if it doesn't find a very close match, it also searches LibGen
I am looking into setting up a separate book database.
@Merijeek commented on GitHub (Aug 30, 2021):
Interesting - some of the books I've got aren't in there. For example, Robert Bevan has a good 14 or 15 books, but only his first one (Critical Failures) is in OL, and near as I can tell, even that one isn't in LG.
@advplyr commented on GitHub (Aug 30, 2021):
Google Books might be the most expansive, but it isn't free.
Good Reads was another big one, but they shut that down for public use.
There is a need for a clean, open book database and I think it is worth pursuing once this project is stable.
@Budlyte commented on GitHub (Sep 16, 2021):
While the matching is pretty decent, and I love the name parsing the Subtitle since it fits my naming scheme, is there a naming scheme that lends to parsing the Volume # ?
@advplyr commented on GitHub (Sep 17, 2021):
There is not, but it seems logical to add that. What did you have in mind?
@Budlyte commented on GitHub (Sep 17, 2021):
My goodness, either we're on opposite sides of the planet or you don't sleep. If you're missing sleep for this, please don't let us bugging you keep you up.
It looks like libgen contains the series number, though it's randomly placed in the book title or the series name... but a quick regex match for digits in either when matching the book up should be able to pull it out. Though this might break matching titles or series that legitimately contain numbers in their name. Hmmmm. Well the number seems to always be at the beginning or end of either, and I'd say a number in the series name should be taken over the title as those are less common to iterate.
Here's what I'm guessing Libgen kicks out from its API, I stumbled upon a page that gave this and seemed to contain a shitload of links to viruses.
@book{book:{6225211},
title = {Armageddon},
author = {Alanson, Craig},
year = {2019},
series = {Expeditionary Force 8},
url = {libgen.li/file.php?md5=0dde1ba14f89f901e24e36cb72d5e792}}
So, after removing the curlies, a regex match could look like:
Check the series first
It looks like you're already trimming the whitespace from the remainders.
Now, if we want to get into actually parsing the Volume # from the filename.... I understand that my naming scheme is probably unique but others probably use something similar. The parser could look to match the following, again assuming you're using regular expressions:
Getting started in someone else's code is difficult, but if you want to point me towards where this thing is written, I can take a look.
@advplyr commented on GitHub (Sep 18, 2021):
Not a bother at all, the opposite actually, these suggestions and bug catches are taking this project to the next level.
This is added in
v1.1.13, I updated the readme with some details.I didn't include
v \d+because I think it would have too many false positives.The regex that I went with is
/(-? ?)\b((?:Book|Vol.?|Volume) (\d{1,3}))\b( ?-?)/iTested in regex101

Where group 3 (purple) is what gets stored in the Volume Number. The other groups are there to help with removing it from the title.
It is here in the code.
@advplyr commented on GitHub (Sep 18, 2021):
I'm going to keep this issue open because matching with LibGen and OpenLib is still not improved. Volume number and series aren't parsed during matching, it needs a full overhaul.
@Budlyte commented on GitHub (Sep 18, 2021):
lol I'm sitting here writing expressions in crayon, meanwhile you're elegantly writing with a quill.
Thanks a ton for the update, it'll help a lot to fill in that blank area. I assume it's only filling in the Volume # if it's empty. Out and about today, no chance to really test anything yet.
@Budlyte commented on GitHub (Sep 18, 2021):
If you're going to overhaul anyway, getting a Google books API key is easy, if you add support for it then it wouldn't be unreasonable to require people get their own keys. LibGen kind of worries me after that wack page I landed on with nonstop popups from it.
@Budlyte commented on GitHub (Sep 19, 2021):
Alright, home now and can scan. It looks like that code only runs on newly created files, which makes sense because the way it is now would null out any manually set volumenumber. I think this is fine, because as people pull in their libraries, then figure out what naming scheme can do what, make those updates, then they just hit Reset and import again.
Great work! Now I think it's a bummer I didn't open a req for this because it would make a nice Closed piece.