mirror of
https://github.com/advplyr/audiobookshelf.git
synced 2026-05-30 23:40:40 +02:00
[Bug]: Too heavy on the I/O during library scan #261
Closed
opened 2026-04-24 23:02:30 +02:00 by adam
·
16 comments
No Branch/Tag Specified
master
book_tags_genres_dedupe
episode_download_fallback
Issue-4540-SortBy-StartedDate-and-FinishedDate
episode_meta_tagging
fix_authorize_race_condition
redirect_transcode_requests
progress_updated_sort
fix_ereader_socket_event
fix_change_empty_root_password
fix_podcast_session_track_index
fix_set_token
session_modal_user
localize_durations
fix_oidc_create_user
jwt_auth_refactor
fix_scanner_deleting_single_file_books
fix_mediaprogress_updatedat_2
experimental_next_client
podcast_episode_duration
episode-timestamps-clickable
book_author_secondary_sort_title
podcast_useragents
pathexists_user_access
fix_pathexists_join
book_author_secondary_sort
clean_duplicate_mediaprogress
sanitize_html_description
trix_prevent_attachments
check_path_api_fix
fix_mediaprogress_updatedat
increase_express_json_limit
fix_dockerfile_nunicode
search_episodes
audiobook_tools_update
episode_secondary_sorts
hls_stream_url_update
new_session_track_endpoint
audiobook_tools_enhancements
watcher_rescans_update
player_track_tooltip
fix_exclude_prefixes_crash
socket_item_events
fix_podcast_episode_scanner_promise
new_stats_controller
count_cache_for_userpermissions
parsing-opf-v3
validate_migration_files
fix-quick-match-all-crash
fix-chapter-end-sleep-timer
stringify_sequelize_query
remove-col-ambiguity
fix_next_prev_edit_description
details_trim_whitespace
fix_content_url_basepath
fix_logger_fatal
progress_bar_visibility
batch-edit-populate-map-details
feed_generator_updates
bookmark-modal-updates
migrate-library-item-in-scanner
migrate-new-library-items
migrate-podcasts-new-library-item-2
migrate-podcasts-new-library-item
fix-remove-episode-from-playlist
playback-session-use-new-library-item
refactor-library-item
fix-heatmap-caption
feed-episodes-upsert
share-media-player-media-session-api
remove-old-playlist
remove_old_collection_object
plugin-implementation-demo
feed_migration
refactor-feeds-from-item
fix_remove_authors_no_books
v2.17.3-fk-constraints-migration
migrations-first-upgrade
sqlite_2
feature/nuxt-target-server
waveform
sqlite
playlists
video
v2.35.1
v2.35.0
v2.34.0
v2.33.2
v2.33.1
v2.33.0
v2.32.1
v2.32.0
v2.31.0
v2.30.0
v2.29.0
v2.28.0
v2.27.0
v2.26.3
v2.26.2
v2.26.1
v2.26.0
v2.25.1
v2.25.0
v2.24.0
v2.23.0
v2.22.0
v2.21.0
v2.20.0
v2.19.5
v2.19.4
v2.19.3
v2.19.2
v2.19.1
v2.19.0
v2.18.1
v2.18.0
v2.17.7
v2.17.6
v2.17.5
v2.17.4
v2.17.3
v2.17.2
v2.17.1
v2.17.0
v2.16.2
v2.16.1
v2.16.0
v2.15.1
v2.15.0
v2.14.0
v2.13.4
v2.13.3
v2.13.2
v2.13.1
v2.13.0
v2.12.3
v2.12.2
v2.12.1
v2.12.0
v2.11.0
v2.10.1
v2.10.0
v2.9.0
v2.8.1
v2.8.0
v2.7.2
v2.7.1
v2.7.0
v2.6.0
v2.5.0
v2.4.4
v2.4.3
v2.4.2
v2.4.1
v2.4.0
v2.3.5
v2.3.4
v2.3.3
v2.3.2
v2.3.1
v2.3.0
v2.2.23
v2.2.22
v2.2.21
v2.2.20
v2.2.19
v2.2.18
v2.2.17
v2.2.16
v2.2.15
v2.2.14
v2.2.13
v2.2.12
v2.2.11
v2.2.10
v2.2.9
v2.2.8
v2.2.7
v2.2.6
v2.2.5
v2.2.4
v2.2.3
v2.2.2
v2.2.1
v2.2.0
v2.1.5
v2.1.4
v2.1.3
v2.1.2
v2.1.1
v2.1.0
v2.0.24
v2.0.23
v2.0.22
v2.0.21
v2.0.20
v2.0.19
v2.0.18
v2.0.17
v2.0.16
v2.0.15
v2.0.14
v2.0.13
v2.0.12
v2.0.11
v2.0.10
v2.0.9
v2.0.8
v2.0.7
v2.0.6
v2.0.5
v2.0.4
v2.0.3
v2.0.2
v2.0.1
v1.7.2
v1.7.1
v1.7.0
v1.6.0
v1.5.5
v1.5.0
v1.4.11
v1.4.9
v1.4.7
v1.4.6
v1.4.4
v1.4.2
v1.4.0
v1.4.1
v1.3.4
v1.3.3
v1.3.1
v1.2.8
v1.2.6
v1.2.5
v1.2.4
v1.2.1
v1.1.15
v1.1.14
v1.1.13
v1.1.12
v1.1.11
v1.1.10
v1.1.9
v1.1.8
v1.0.0
0.9.61-beta.0
0.9.61-beta
Labels
Clear labels
authentication
backlog
bug
chapter editor
config-issue
ebooks
encoding/embedding
enhancement
help wanted
listening sessions & progress
planned
possible plugin
progress sync
pull-request
sorting/filtering/searching
unable to reproduce
upload
users & permissions
waiting
Mirrored from GitHub Pull Request
No Label
bug
Milestone
No items
No Milestone
Projects
Clear projects
No project
Assignees
adam (Adam Melkus)
Clear assignees
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: starred/audiobookshelf#261
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Inrego on GitHub (Apr 8, 2022).
Describe the issue
When scanning the library, it's way too heavy on the I/O.
My audiobooks are hosted on Google Drive, mounted with RClone. When scanning a library folder with just about 10 books in it, the whole server completely freezes up. I can't even reboot the server properly, I have to power cycle it.
Audiobookshelf is running in a Docker container, and since it happens during library scan, I can only imagine it's related to IO.
I've been running this same setup for a few years without issues. Only Audiobookshelf is causing this kind of issue.
Other similar services I'm running in the same setup, which doesn't run into this problem:
Steps to reproduce the issue
Audiobookshelf version
v1.7.2
How are you running audiobookshelf?
Docker
@Inrego commented on GitHub (Apr 8, 2022):
If I add just a few books to the mounted folder at a time, it can scan it. However, it still freezes up for a little while (maybe around 30 seconds per book).
@advplyr commented on GitHub (Apr 24, 2022):
This was improved a bit in v2 but still needs work
@lduesing commented on GitHub (Jun 22, 2022):
Problem is you are running ffprobe for all books you find. Parallel. I tried to reproduce the problem and had more than 80 ffprobe processes...
@lduesing commented on GitHub (Jun 22, 2022):
Solution:
In the documentation of node-ffprobe:
@Inrego commented on GitHub (Jun 22, 2022):
That's great. I was about to suggest using a semaphore with a setting to control max number of processes. But if it's already handled by ffprobe by a simple parameter, I guess that's the easier fix!
@advplyr commented on GitHub (Jun 22, 2022):
I don't think we want to run them synchronously. The heavy I/O is because of many ffprobes running at once which I improved a bit on v2, but we are only using a single thread. We could further reduce the number of ffprobes running asynchronously and also split them into multiple threads.
@lduesing commented on GitHub (Jun 22, 2022):
Sorry, each ffprobe process uses round about 120 MiB of virtual ram. In a directory with 80 audiobooks on a raspberry in docker your image gets killed. Scanning is something that will not happen every day, so I do not see why serialized scanning will be a problem.
@advplyr commented on GitHub (Jun 22, 2022):
I don't think the image being killed with 80 audiobooks is a common issue. This is highly dependent on your specs of course and one of the biggest factors in reduced performance of the scanner is using a remote file system.
We have users with 10k+ audiobook libraries where scans can take hours. If we utilize all the cores of your processor instead of just one then the scan time can be a fraction of what it is now.
Currently how it works is it splits up all the audio files that need to be scanned into batches of at most 2.5GB. So if you have a 1GB audio file and 3 500MB audio files that would make up a single batch. Those batches are run synchronously where each batch will execute ffprobe on each audio file asynchronously on a single thread.
How much RAM is used on an ffprobe would be highly dependent on the size of the audio file which is why I chose to split up the batches by file size.
My proposal to increase performance would be to run the ffprobe commands in parallel on X threads where X would be the number of processor cores. 4-core processor would start 4 threads where each thread is executing a single ffprobe. I think I'm actually agreeing with you but just proposing we spread the workload across the processor.
I'm a fan of projects that are highly customizable as long as it doesn't look like a jumbled up mess in the UI, so I think having a setting to adjust the variable X could be a nice addition.
@advplyr commented on GitHub (Jun 22, 2022):
I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably.
@hobesman commented on GitHub (Jun 22, 2022):
All the more so because in common parlance, synchronous means "at the same same time" and asynchronous means "not at the same time"
@advplyr commented on GitHub (Jun 22, 2022):
I've been doing so much node.js the last few years that I didn't realize how confusing it would sound at first. Node.js was built to run on a single thread and so "asynchronous" is really saying "handle this workload as optimally as you can, I don't care in which order you do it".
@hobesman commented on GitHub (Jun 22, 2022):
I remember encountering this for the first time with Core for SmartThings, the home automation platform. The user can create complex conditions with if and if not and xor and so on for controlling smart devices. I remember thinking "who came up with this for synchronous commands to mean sequential and asynchronous means it can run in parallel?" It was very counterintuitive for the uninitiated user. Even if that's the term ultimately used in the underlying code, I agree the discussion and/or interface should probably avoid those terms.
From: advplyr @.>
Sent: Wednesday, June 22, 2022 6:27:47 AM
To: advplyr/audiobookshelf @.>
Cc: hobesman @.>; Comment @.>
Subject: Re: [advplyr/audiobookshelf] [Bug]: Too heavy on the I/O during library scan (Issue #444)
I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably.
All the more so because in common parlance, synchronous means "at the same same time" and asynchronous means "not at the same time"
I've been doing so much node.js the last few years that I didn't realize how confusing it would sound at first. Node.js was built to run on a single thread and so "asynchronous" is really saying "handle this workload as optimally as you can, I don't care in which order you do it".
—
Reply to this email directly, view it on GitHubhttps://github.com/advplyr/audiobookshelf/issues/444#issuecomment-1163097986, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHCYJJNSFXNMSZSO6XCEMLDVQMIFHANCNFSM5S4YRTRA.
You are receiving this because you commented.Message ID: @.***>
@rasmuslos commented on GitHub (Jun 23, 2022):
You can summon workers with NodeJS. These will run on a different thread but can communicate with the master
https://nodejs.org/api/worker_threads.html
@advplyr commented on GitHub (Jun 23, 2022):
We use worker threads for making M4b files already. https://github.com/advplyr/audiobookshelf/blob/master/server/managers/AbMergeManager.js#L186
We just need to re-build the scanner to use them.
@cnu80 commented on GitHub (Jan 12, 2023):
Hi, I have the same issue with audiobookshelf on my Synology NAS. To many ffprobes:


max. count parameter would be great. thanks
@advplyr commented on GitHub (Mar 29, 2023):
I missed this issue but this was fixed a few versions ago. Basically we are capping the number of ffprobe processes.