[Bug]: Too heavy on the I/O during library scan #261

Closed
opened 2026-04-24 23:02:30 +02:00 by adam · 16 comments
Owner

Originally created by @Inrego on GitHub (Apr 8, 2022).

Describe the issue

When scanning the library, it's way too heavy on the I/O.
My audiobooks are hosted on Google Drive, mounted with RClone. When scanning a library folder with just about 10 books in it, the whole server completely freezes up. I can't even reboot the server properly, I have to power cycle it.

Audiobookshelf is running in a Docker container, and since it happens during library scan, I can only imagine it's related to IO.

I've been running this same setup for a few years without issues. Only Audiobookshelf is causing this kind of issue.
Other similar services I'm running in the same setup, which doesn't run into this problem:

  • Plex
  • Emby
  • Jellyfin
  • Radarr
  • Sonarr

Steps to reproduce the issue

  1. Mount GDrive with rclone
  2. Start audiobookshelf docker container with GDrive folder mounted.
  3. Scan library

Audiobookshelf version

v1.7.2

How are you running audiobookshelf?

Docker

Originally created by @Inrego on GitHub (Apr 8, 2022). ### Describe the issue When scanning the library, it's way too heavy on the I/O. My audiobooks are hosted on Google Drive, mounted with RClone. When scanning a library folder with just about 10 books in it, the whole server completely freezes up. I can't even reboot the server properly, I have to power cycle it. Audiobookshelf is running in a Docker container, and since it happens during library scan, I can only imagine it's related to IO. I've been running this same setup for a few years without issues. Only Audiobookshelf is causing this kind of issue. Other similar services I'm running in the same setup, which doesn't run into this problem: * Plex * Emby * Jellyfin * Radarr * Sonarr ### Steps to reproduce the issue 1. Mount GDrive with rclone 2. Start audiobookshelf docker container with GDrive folder mounted. 3. Scan library ### Audiobookshelf version v1.7.2 ### How are you running audiobookshelf? Docker
adam added the bug label 2026-04-24 23:02:30 +02:00
adam closed this issue 2026-04-24 23:02:30 +02:00
Author
Owner

@Inrego commented on GitHub (Apr 8, 2022):

If I add just a few books to the mounted folder at a time, it can scan it. However, it still freezes up for a little while (maybe around 30 seconds per book).

@Inrego commented on GitHub (Apr 8, 2022): If I add just a few books to the mounted folder at a time, it can scan it. However, it still freezes up for a little while (maybe around 30 seconds per book).
Author
Owner

@advplyr commented on GitHub (Apr 24, 2022):

This was improved a bit in v2 but still needs work

@advplyr commented on GitHub (Apr 24, 2022): This was improved a bit in v2 but still needs work
Author
Owner

@lduesing commented on GitHub (Jun 22, 2022):

Problem is you are running ffprobe for all books you find. Parallel. I tried to reproduce the problem and had more than 80 ffprobe processes...

@lduesing commented on GitHub (Jun 22, 2022): Problem is you are running ffprobe for all books you find. Parallel. I tried to reproduce the problem and had more than 80 ffprobe processes...
Author
Owner

@lduesing commented on GitHub (Jun 22, 2022):

Solution:
In the documentation of node-ffprobe:

Additionnally, you can set ffprobe.SYNC to true if you want for a particular reason to launch ffprobe synchronously (for example when used in batch processing of files to avoid too many spawns at once.)
@lduesing commented on GitHub (Jun 22, 2022): Solution: In the [documentation of node-ffprobe]( https://www.npmjs.com/package/node-ffprobe): ``` Additionnally, you can set ffprobe.SYNC to true if you want for a particular reason to launch ffprobe synchronously (for example when used in batch processing of files to avoid too many spawns at once.) ```
Author
Owner

@Inrego commented on GitHub (Jun 22, 2022):

That's great. I was about to suggest using a semaphore with a setting to control max number of processes. But if it's already handled by ffprobe by a simple parameter, I guess that's the easier fix!

@Inrego commented on GitHub (Jun 22, 2022): That's great. I was about to suggest using a semaphore with a setting to control max number of processes. But if it's already handled by ffprobe by a simple parameter, I guess that's the easier fix!
Author
Owner

@advplyr commented on GitHub (Jun 22, 2022):

I don't think we want to run them synchronously. The heavy I/O is because of many ffprobes running at once which I improved a bit on v2, but we are only using a single thread. We could further reduce the number of ffprobes running asynchronously and also split them into multiple threads.

@advplyr commented on GitHub (Jun 22, 2022): I don't think we want to run them synchronously. The heavy I/O is because of many ffprobes running at once which I improved a bit on v2, but we are only using a single thread. We could further reduce the number of ffprobes running asynchronously and also split them into multiple threads.
Author
Owner

@lduesing commented on GitHub (Jun 22, 2022):

Sorry, each ffprobe process uses round about 120 MiB of virtual ram. In a directory with 80 audiobooks on a raspberry in docker your image gets killed. Scanning is something that will not happen every day, so I do not see why serialized scanning will be a problem.

@lduesing commented on GitHub (Jun 22, 2022): Sorry, each ffprobe process uses round about 120 MiB of virtual ram. In a directory with 80 audiobooks on a raspberry in docker your image gets killed. Scanning is something that will not happen every day, so I do not see why serialized scanning will be a problem.
Author
Owner

@advplyr commented on GitHub (Jun 22, 2022):

I don't think the image being killed with 80 audiobooks is a common issue. This is highly dependent on your specs of course and one of the biggest factors in reduced performance of the scanner is using a remote file system.

We have users with 10k+ audiobook libraries where scans can take hours. If we utilize all the cores of your processor instead of just one then the scan time can be a fraction of what it is now.

Currently how it works is it splits up all the audio files that need to be scanned into batches of at most 2.5GB. So if you have a 1GB audio file and 3 500MB audio files that would make up a single batch. Those batches are run synchronously where each batch will execute ffprobe on each audio file asynchronously on a single thread.

How much RAM is used on an ffprobe would be highly dependent on the size of the audio file which is why I chose to split up the batches by file size.

My proposal to increase performance would be to run the ffprobe commands in parallel on X threads where X would be the number of processor cores. 4-core processor would start 4 threads where each thread is executing a single ffprobe. I think I'm actually agreeing with you but just proposing we spread the workload across the processor.

I'm a fan of projects that are highly customizable as long as it doesn't look like a jumbled up mess in the UI, so I think having a setting to adjust the variable X could be a nice addition.

@advplyr commented on GitHub (Jun 22, 2022): I don't think the image being killed with 80 audiobooks is a common issue. This is highly dependent on your specs of course and one of the biggest factors in reduced performance of the scanner is using a remote file system. We have users with 10k+ audiobook libraries where scans can take hours. If we utilize all the cores of your processor instead of just one then the scan time can be a fraction of what it is now. Currently how it works is it splits up all the audio files that need to be scanned into batches of at most 2.5GB. So if you have a 1GB audio file and 3 500MB audio files that would make up a single batch. Those batches are run synchronously where each batch will execute ffprobe on each audio file asynchronously on a single thread. How much RAM is used on an ffprobe would be highly dependent on the size of the audio file which is why I chose to split up the batches by file size. My proposal to increase performance would be to run the ffprobe commands in parallel on X threads where X would be the number of processor cores. 4-core processor would start 4 threads where each thread is executing a single ffprobe. I think I'm actually agreeing with you but just proposing we spread the workload across the processor. I'm a fan of projects that are highly customizable as long as it doesn't look like a jumbled up mess in the UI, so I think having a setting to adjust the variable X could be a nice addition.
Author
Owner

@advplyr commented on GitHub (Jun 22, 2022):

I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably.

@advplyr commented on GitHub (Jun 22, 2022): I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably.
Author
Owner

@hobesman commented on GitHub (Jun 22, 2022):

I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably.

All the more so because in common parlance, synchronous means "at the same same time" and asynchronous means "not at the same time"

@hobesman commented on GitHub (Jun 22, 2022): > I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably. All the more so because in common parlance, synchronous means "at the same same time" and asynchronous means "not at the same time"
Author
Owner

@advplyr commented on GitHub (Jun 22, 2022):

I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably.

All the more so because in common parlance, synchronous means "at the same same time" and asynchronous means "not at the same time"

I've been doing so much node.js the last few years that I didn't realize how confusing it would sound at first. Node.js was built to run on a single thread and so "asynchronous" is really saying "handle this workload as optimally as you can, I don't care in which order you do it".

@advplyr commented on GitHub (Jun 22, 2022): > > I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably. > > All the more so because in common parlance, synchronous means "at the same same time" and asynchronous means "not at the same time" I've been doing so much node.js the last few years that I didn't realize how confusing it would sound at first. Node.js was built to run on a single thread and so "asynchronous" is really saying "handle this workload as optimally as you can, I don't care in which order you do it".
Author
Owner

@hobesman commented on GitHub (Jun 22, 2022):

I remember encountering this for the first time with Core for SmartThings, the home automation platform. The user can create complex conditions with if and if not and xor and so on for controlling smart devices. I remember thinking "who came up with this for synchronous commands to mean sequential and asynchronous means it can run in parallel?" It was very counterintuitive for the uninitiated user. Even if that's the term ultimately used in the underlying code, I agree the discussion and/or interface should probably avoid those terms.


From: advplyr @.>
Sent: Wednesday, June 22, 2022 6:27:47 AM
To: advplyr/audiobookshelf @.
>
Cc: hobesman @.>; Comment @.>
Subject: Re: [advplyr/audiobookshelf] [Bug]: Too heavy on the I/O during library scan (Issue #444)

I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably.

All the more so because in common parlance, synchronous means "at the same same time" and asynchronous means "not at the same time"

I've been doing so much node.js the last few years that I didn't realize how confusing it would sound at first. Node.js was built to run on a single thread and so "asynchronous" is really saying "handle this workload as optimally as you can, I don't care in which order you do it".


Reply to this email directly, view it on GitHubhttps://github.com/advplyr/audiobookshelf/issues/444#issuecomment-1163097986, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHCYJJNSFXNMSZSO6XCEMLDVQMIFHANCNFSM5S4YRTRA.
You are receiving this because you commented.Message ID: @.***>

@hobesman commented on GitHub (Jun 22, 2022): I remember encountering this for the first time with Core for SmartThings, the home automation platform. The user can create complex conditions with if and if not and xor and so on for controlling smart devices. I remember thinking "who came up with this for synchronous commands to mean sequential and asynchronous means it can run in parallel?" It was very counterintuitive for the uninitiated user. Even if that's the term ultimately used in the underlying code, I agree the discussion and/or interface should probably avoid those terms. ________________________________ From: advplyr ***@***.***> Sent: Wednesday, June 22, 2022 6:27:47 AM To: advplyr/audiobookshelf ***@***.***> Cc: hobesman ***@***.***>; Comment ***@***.***> Subject: Re: [advplyr/audiobookshelf] [Bug]: Too heavy on the I/O during library scan (Issue #444) I want to add that the asynchronous/synchronous might be a bit misleading because in node.js asynchronous doesn't mean running in parallel. Whereas in other languages these could be used interchangeably. All the more so because in common parlance, synchronous means "at the same same time" and asynchronous means "not at the same time" I've been doing so much node.js the last few years that I didn't realize how confusing it would sound at first. Node.js was built to run on a single thread and so "asynchronous" is really saying "handle this workload as optimally as you can, I don't care in which order you do it". — Reply to this email directly, view it on GitHub<https://github.com/advplyr/audiobookshelf/issues/444#issuecomment-1163097986>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHCYJJNSFXNMSZSO6XCEMLDVQMIFHANCNFSM5S4YRTRA>. You are receiving this because you commented.Message ID: ***@***.***>
Author
Owner

@rasmuslos commented on GitHub (Jun 23, 2022):

You can summon workers with NodeJS. These will run on a different thread but can communicate with the master

https://nodejs.org/api/worker_threads.html

@rasmuslos commented on GitHub (Jun 23, 2022): You can summon workers with NodeJS. These will run on a different thread but can communicate with the master https://nodejs.org/api/worker_threads.html
Author
Owner

@advplyr commented on GitHub (Jun 23, 2022):

You can summon workers with NodeJS. These will run on a different thread but can communicate with the master

https://nodejs.org/api/worker_threads.html

We use worker threads for making M4b files already. https://github.com/advplyr/audiobookshelf/blob/master/server/managers/AbMergeManager.js#L186

We just need to re-build the scanner to use them.

@advplyr commented on GitHub (Jun 23, 2022): > You can summon workers with NodeJS. These will run on a different thread but can communicate with the master > > https://nodejs.org/api/worker_threads.html We use worker threads for making M4b files already. https://github.com/advplyr/audiobookshelf/blob/master/server/managers/AbMergeManager.js#L186 We just need to re-build the scanner to use them.
Author
Owner

@cnu80 commented on GitHub (Jan 12, 2023):

Hi, I have the same issue with audiobookshelf on my Synology NAS. To many ffprobes:
image
image

max. count parameter would be great. thanks

@cnu80 commented on GitHub (Jan 12, 2023): Hi, I have the same issue with audiobookshelf on my Synology NAS. To many ffprobes: ![image](https://user-images.githubusercontent.com/28894450/212156813-20297d11-beb2-48c0-a164-b989b542c838.png) ![image](https://user-images.githubusercontent.com/28894450/212156922-86a00819-719a-4555-9909-4f1bcbb1edfc.png) max. count parameter would be great. thanks
Author
Owner

@advplyr commented on GitHub (Mar 29, 2023):

I missed this issue but this was fixed a few versions ago. Basically we are capping the number of ffprobe processes.

@advplyr commented on GitHub (Mar 29, 2023): I missed this issue but this was fixed a few versions ago. Basically we are capping the number of ffprobe processes.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/audiobookshelf#261