[Bug]: Too heavy on the I/O during library scan #261

New Issue

2026-04-24T23:02:30+02:00

adam commented

2026-04-24 23:02:30 +02:00

Originally created by @Inrego on GitHub (Apr 8, 2022).

Describe the issue

When scanning the library, it's way too heavy on the I/O.
My audiobooks are hosted on Google Drive, mounted with RClone. When scanning a library folder with just about 10 books in it, the whole server completely freezes up. I can't even reboot the server properly, I have to power cycle it.

Audiobookshelf is running in a Docker container, and since it happens during library scan, I can only imagine it's related to IO.

I've been running this same setup for a few years without issues. Only Audiobookshelf is causing this kind of issue.
Other similar services I'm running in the same setup, which doesn't run into this problem:

Plex
Emby
Jellyfin
Radarr
Sonarr

Steps to reproduce the issue

Mount GDrive with rclone
Start audiobookshelf docker container with GDrive folder mounted.
Scan library

Audiobookshelf version

v1.7.2

How are you running audiobookshelf?

Docker

Originally created by @Inrego on GitHub (Apr 8, 2022). ### Describe the issue When scanning the library, it's way too heavy on the I/O. My audiobooks are hosted on Google Drive, mounted with RClone. When scanning a library folder with just about 10 books in it, the whole server completely freezes up. I can't even reboot the server properly, I have to power cycle it. Audiobookshelf is running in a Docker container, and since it happens during library scan, I can only imagine it's related to IO. I've been running this same setup for a few years without issues. Only Audiobookshelf is causing this kind of issue. Other similar services I'm running in the same setup, which doesn't run into this problem: * Plex * Emby * Jellyfin * Radarr * Sonarr ### Steps to reproduce the issue 1. Mount GDrive with rclone 2. Start audiobookshelf docker container with GDrive folder mounted. 3. Scan library ### Audiobookshelf version v1.7.2 ### How are you running audiobookshelf? Docker

adam added the bug label 2026-04-24 23:02:30 +02:00

adam closed this issue

2026-04-24 23:02:30 +02:00

adam commented

2026-04-24 23:02:31 +02:00

@Inrego commented on GitHub (Apr 8, 2022):

If I add just a few books to the mounted folder at a time, it can scan it. However, it still freezes up for a little while (maybe around 30 seconds per book).

@Inrego commented on GitHub (Apr 8, 2022): If I add just a few books to the mounted folder at a time, it can scan it. However, it still freezes up for a little while (maybe around 30 seconds per book).

adam commented

2026-04-24 23:02:31 +02:00

@advplyr commented on GitHub (Apr 24, 2022):

This was improved a bit in v2 but still needs work

@advplyr commented on GitHub (Apr 24, 2022): This was improved a bit in v2 but still needs work

adam commented

2026-04-24 23:02:32 +02:00

@lduesing commented on GitHub (Jun 22, 2022):

Problem is you are running ffprobe for all books you find. Parallel. I tried to reproduce the problem and had more than 80 ffprobe processes...

@lduesing commented on GitHub (Jun 22, 2022): Problem is you are running ffprobe for all books you find. Parallel. I tried to reproduce the problem and had more than 80 ffprobe processes...

adam commented

2026-04-24 23:02:32 +02:00

@lduesing commented on GitHub (Jun 22, 2022):

Solution:
In the documentation of node-ffprobe:

Additionnally, you can set ffprobe.SYNC to true if you want for a particular reason to launch ffprobe synchronously (for example when used in batch processing of files to avoid too many spawns at once.)

@lduesing commented on GitHub (Jun 22, 2022): Solution: In the [documentation of node-ffprobe]( https://www.npmjs.com/package/node-ffprobe): ``` Additionnally, you can set ffprobe.SYNC to true if you want for a particular reason to launch ffprobe synchronously (for example when used in batch processing of files to avoid too many spawns at once.) ```

adam commented

2026-04-24 23:02:33 +02:00

@Inrego commented on GitHub (Jun 22, 2022):

That's great. I was about to suggest using a semaphore with a setting to control max number of processes. But if it's already handled by ffprobe by a simple parameter, I guess that's the easier fix!

@Inrego commented on GitHub (Jun 22, 2022): That's great. I was about to suggest using a semaphore with a setting to control max number of processes. But if it's already handled by ffprobe by a simple parameter, I guess that's the easier fix!

adam commented

2026-04-24 23:02:34 +02:00

@advplyr commented on GitHub (Jun 22, 2022):

I don't think we want to run them synchronously. The heavy I/O is because of many ffprobes running at once which I improved a bit on v2, but we are only using a single thread. We could further reduce the number of ffprobes running asynchronously and also split them into multiple threads.

@advplyr commented on GitHub (Jun 22, 2022): I don't think we want to run them synchronously. The heavy I/O is because of many ffprobes running at once which I improved a bit on v2, but we are only using a single thread. We could further reduce the number of ffprobes running asynchronously and also split them into multiple threads.

adam commented

2026-04-24 23:02:34 +02:00

@lduesing commented on GitHub (Jun 22, 2022):

Sorry, each ffprobe process uses round about 120 MiB of virtual ram. In a directory with 80 audiobooks on a raspberry in docker your image gets killed. Scanning is something that will not happen every day, so I do not see why serialized scanning will be a problem.

@lduesing commented on GitHub (Jun 22, 2022): Sorry, each ffprobe process uses round about 120 MiB of virtual ram. In a directory with 80 audiobooks on a raspberry in docker your image gets killed. Scanning is something that will not happen every day, so I do not see why serialized scanning will be a problem.

adam commented

2026-04-24 23:02:35 +02:00

@advplyr commented on GitHub (Jun 22, 2022):

I don't think the image being killed with 80 audiobooks is a common issue. This is highly dependent on your specs of course and one of the biggest factors in reduced performance of the scanner is using a remote file system.

We have users with 10k+ audiobook libraries where scans can take hours. If we utilize all the cores of your processor instead of just one then the scan time can be a fraction of what it is now.

Currently how it works is it splits up all the audio files that need to be scanned into batches of at most 2.5GB. So if you have a 1GB audio file and 3 500MB audio files that would make up a single batch. Those batches are run synchronously where each batch will execute ffprobe on each audio file asynchronously on a single thread.

How much RAM is used on an ffprobe would be highly dependent on the size of the audio file which is why I chose to split up the batches by file size.

My proposal to increase performance would be to run the ffprobe commands in parallel on X threads where X would be the number of processor cores. 4-core processor would start 4 threads where each thread is executing a single ffprobe. I think I'm actually agreeing with you but just proposing we spread the workload across the processor.

I'm a fan of projects that are highly customizable as long as it doesn't look like a jumbled up mess in the UI, so I think having a setting to adjust the variable X could be a nice addition.

@advplyr commented on GitHub (Jun 22, 2022): I don't think the image being killed with 80 audiobooks is a common issue. This is highly dependent on your specs of course and one of the biggest factors in reduced performance of the scanner is using a remote file system. We have users with 10k+ audiobook libraries where scans can take hours. If we utilize all the cores of your processor instead of just one then the scan time can be a fraction of what it is now. Currently how it works is it splits up all the audio files that need to be scanned into batches of at most 2.5GB. So if you have a 1GB audio file and 3 500MB audio files that would make up a single batch. Those batches are run synchronously where each batch will execute ffprobe on each audio file asynchronously on a single thread. How much RAM is used on an ffprobe would be highly dependent on the size of the audio file which is why I chose to split up the batches by file size. My proposal to increase performance would be to run the ffprobe commands in parallel on X threads where X would be the number of processor cores. 4-core processor would start 4 threads where each thread is executing a single ffprobe. I think I'm actually agreeing with you but just proposing we spread the workload across the processor. I'm a fan of projects that are highly customizable as long as it doesn't look like a jumbled up mess in the UI, so I think having a setting to adjust the variable X could be a nice addition.

adam commented

2026-04-24 23:02:35 +02:00

@advplyr commented on GitHub (Jun 22, 2022):

I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably.

@advplyr commented on GitHub (Jun 22, 2022): I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably.

adam commented

2026-04-24 23:02:35 +02:00

@hobesman commented on GitHub (Jun 22, 2022):

I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably.

All the more so because in common parlance, synchronous means "at the same same time" and asynchronous means "not at the same time"

@hobesman commented on GitHub (Jun 22, 2022): > I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably. All the more so because in common parlance, synchronous means "at the same same time" and asynchronous means "not at the same time"

adam commented

2026-04-24 23:02:36 +02:00

@advplyr commented on GitHub (Jun 22, 2022):

I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably.

All the more so because in common parlance, synchronous means "at the same same time" and asynchronous means "not at the same time"

I've been doing so much node.js the last few years that I didn't realize how confusing it would sound at first. Node.js was built to run on a single thread and so "asynchronous" is really saying "handle this workload as optimally as you can, I don't care in which order you do it".

@advplyr commented on GitHub (Jun 22, 2022): > > I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably. > > All the more so because in common parlance, synchronous means "at the same same time" and asynchronous means "not at the same time" I've been doing so much node.js the last few years that I didn't realize how confusing it would sound at first. Node.js was built to run on a single thread and so "asynchronous" is really saying "handle this workload as optimally as you can, I don't care in which order you do it".

adam commented

2026-04-24 23:02:36 +02:00

@hobesman commented on GitHub (Jun 22, 2022):

I remember encountering this for the first time with Core for SmartThings, the home automation platform. The user can create complex conditions with if and if not and xor and so on for controlling smart devices. I remember thinking "who came up with this for synchronous commands to mean sequential and asynchronous means it can run in parallel?" It was very counterintuitive for the uninitiated user. Even if that's the term ultimately used in the underlying code, I agree the discussion and/or interface should probably avoid those terms.

From: advplyr @.>
Sent: Wednesday, June 22, 2022 6:27:47 AM
To: advplyr/audiobookshelf @.>
Cc: hobesman @.>; Comment @.>
Subject: Re: [advplyr/audiobookshelf] [Bug]: Too heavy on the I/O during library scan (Issue #444)

I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably.

All the more so because in common parlance, synchronous means "at the same same time" and asynchronous means "not at the same time"

I've been doing so much node.js the last few years that I didn't realize how confusing it would sound at first. Node.js was built to run on a single thread and so "asynchronous" is really saying "handle this workload as optimally as you can, I don't care in which order you do it".

—
Reply to this email directly, view it on GitHubhttps://github.com/advplyr/audiobookshelf/issues/444#issuecomment-1163097986, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHCYJJNSFXNMSZSO6XCEMLDVQMIFHANCNFSM5S4YRTRA.
You are receiving this because you commented.Message ID: @.***>

@hobesman commented on GitHub (Jun 22, 2022): I remember encountering this for the first time with Core for SmartThings, the home automation platform. The user can create complex conditions with if and if not and xor and so on for controlling smart devices. I remember thinking "who came up with this for synchronous commands to mean sequential and asynchronous means it can run in parallel?" It was very counterintuitive for the uninitiated user. Even if that's the term ultimately used in the underlying code, I agree the discussion and/or interface should probably avoid those terms. ________________________________ From: advplyr ***@***.***> Sent: Wednesday, June 22, 2022 6:27:47 AM To: advplyr/audiobookshelf ***@***.***> Cc: hobesman ***@***.***>; Comment ***@***.***> Subject: Re: [advplyr/audiobookshelf] [Bug]: Too heavy on the I/O during library scan (Issue #444) I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably. All the more so because in common parlance, synchronous means "at the same same time" and asynchronous means "not at the same time" I've been doing so much node.js the last few years that I didn't realize how confusing it would sound at first. Node.js was built to run on a single thread and so "asynchronous" is really saying "handle this workload as optimally as you can, I don't care in which order you do it". — Reply to this email directly, view it on GitHub<https://github.com/advplyr/audiobookshelf/issues/444#issuecomment-1163097986>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHCYJJNSFXNMSZSO6XCEMLDVQMIFHANCNFSM5S4YRTRA>. You are receiving this because you commented.Message ID: ***@***.***>

adam commented

2026-04-24 23:02:37 +02:00

@rasmuslos commented on GitHub (Jun 23, 2022):

You can summon workers with NodeJS. These will run on a different thread but can communicate with the master

https://nodejs.org/api/worker_threads.html

@rasmuslos commented on GitHub (Jun 23, 2022): You can summon workers with NodeJS. These will run on a different thread but can communicate with the master https://nodejs.org/api/worker_threads.html

adam commented

2026-04-24 23:02:38 +02:00

@advplyr commented on GitHub (Jun 23, 2022):

You can summon workers with NodeJS. These will run on a different thread but can communicate with the master

https://nodejs.org/api/worker_threads.html

We use worker threads for making M4b files already. https://github.com/advplyr/audiobookshelf/blob/master/server/managers/AbMergeManager.js#L186

We just need to re-build the scanner to use them.

@advplyr commented on GitHub (Jun 23, 2022): > You can summon workers with NodeJS. These will run on a different thread but can communicate with the master > > https://nodejs.org/api/worker_threads.html We use worker threads for making M4b files already. https://github.com/advplyr/audiobookshelf/blob/master/server/managers/AbMergeManager.js#L186 We just need to re-build the scanner to use them.

adam commented

2026-04-24 23:02:38 +02:00

@cnu80 commented on GitHub (Jan 12, 2023):

Hi, I have the same issue with audiobookshelf on my Synology NAS. To many ffprobes:

max. count parameter would be great. thanks

@cnu80 commented on GitHub (Jan 12, 2023): Hi, I have the same issue with audiobookshelf on my Synology NAS. To many ffprobes: ![image](https://user-images.githubusercontent.com/28894450/212156813-20297d11-beb2-48c0-a164-b989b542c838.png) ![image](https://user-images.githubusercontent.com/28894450/212156922-86a00819-719a-4555-9909-4f1bcbb1edfc.png) max. count parameter would be great. thanks

adam commented

2026-04-24 23:02:40 +02:00

@advplyr commented on GitHub (Mar 29, 2023):

I missed this issue but this was fixed a few versions ago. Basically we are capping the number of ffprobe processes.

@advplyr commented on GitHub (Mar 29, 2023): I missed this issue but this was fixed a few versions ago. Basically we are capping the number of ffprobe processes.

Sign in to join this conversation.

Branches Tags

master

book_tags_genres_dedupe

episode_download_fallback

Issue-4540-SortBy-StartedDate-and-FinishedDate

episode_meta_tagging

fix_authorize_race_condition

redirect_transcode_requests

progress_updated_sort

fix_ereader_socket_event

fix_change_empty_root_password

fix_podcast_session_track_index

fix_set_token

session_modal_user

localize_durations

fix_oidc_create_user

jwt_auth_refactor

fix_scanner_deleting_single_file_books

fix_mediaprogress_updatedat_2

experimental_next_client

podcast_episode_duration

episode-timestamps-clickable

book_author_secondary_sort_title

podcast_useragents

pathexists_user_access

fix_pathexists_join

book_author_secondary_sort

clean_duplicate_mediaprogress

sanitize_html_description

trix_prevent_attachments

check_path_api_fix

fix_mediaprogress_updatedat

increase_express_json_limit

fix_dockerfile_nunicode

search_episodes

audiobook_tools_update

episode_secondary_sorts

hls_stream_url_update

new_session_track_endpoint

audiobook_tools_enhancements

watcher_rescans_update

player_track_tooltip

fix_exclude_prefixes_crash

socket_item_events

fix_podcast_episode_scanner_promise

new_stats_controller

count_cache_for_userpermissions

parsing-opf-v3

validate_migration_files

fix-quick-match-all-crash

fix-chapter-end-sleep-timer

stringify_sequelize_query

remove-col-ambiguity

fix_next_prev_edit_description

details_trim_whitespace

fix_content_url_basepath

fix_logger_fatal

progress_bar_visibility

batch-edit-populate-map-details

feed_generator_updates

bookmark-modal-updates

migrate-library-item-in-scanner

migrate-new-library-items

migrate-podcasts-new-library-item-2

migrate-podcasts-new-library-item

fix-remove-episode-from-playlist

playback-session-use-new-library-item

refactor-library-item

fix-heatmap-caption

feed-episodes-upsert

share-media-player-media-session-api

remove-old-playlist

remove_old_collection_object

plugin-implementation-demo

feed_migration

refactor-feeds-from-item

fix_remove_authors_no_books

v2.17.3-fk-constraints-migration

migrations-first-upgrade

sqlite_2

feature/nuxt-target-server

waveform

sqlite

playlists

video

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/audiobookshelf#261