[Bug]: Scanner regularly breaks on network file transfer #866

Closed
opened 2026-04-24 23:24:36 +02:00 by adam · 9 comments
Owner

Originally created by @lkiesow on GitHub (Jan 4, 2023).

Describe the issue

When I started using Audiobookshelf I've noticed that several of my imported books were not imported. Going through the logs, I found many errors related to imports which essentially boiled down to a failure to execute ffprobe successfully.

There are several other reports of something like this, like e.g. #1036, which suggest re-encoding the files to fix the metadata. But I found that just renaming the files would already fix the problem.

Later I also found that a lot of books which were successfully imported had an incorrect file size listed.

My suspicion was that the scanner would pick up and try to process files which had only been half-transferred to the server file system. Investigating the issue further, that seems to be true.

Steps to reproduce the issue

To see what happens, let's first generate an audio file. Not that this does not use the FFmpeg flag faststart. This essentially means that metadata will be at the end of the file in this mp4 container:

❯ ffmpeg -f lavfi -i sine=f=220:b=4 -t 2:00:00 -c:a aac -ar 48k -b:a 128k out.m4a

To verify that ffprobe fails when run on a half transferred file, we can just copy the first 100 MB to a new file and run ffprobe on that file:

❯ dd if=out.m4a of=part.m4a bs=10M count=10
❯ ffprobe part.m4a
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x563170cc2ac0] moov atom not found
part.m4a: Invalid data found when processing input

This is exactly what I expected to happen.
Next, let's see how Audiobookshelf handles these this. I used pv to simulate a slow network file transfer of 100 KB/s while copying the audio file to a library directory:

❯ mkdir -p audiobooks/x/y
❯ cat out.m4a | pv -L 100K > audiobooks/x/y/out.m4a

Let this run for a few minutes, then hit Ctrl + z to suspend this for a few minutes to simulate a slight network glich before resuming the process by running:

❯ fg

The copy process continues, but Audiobookshelf starts processing the file. Since the metadata is still missing from the file, ffprobe will fail and that means the import process will fail:

[2023-01-04 22:23:26] DEBUG: [Watcher] File Added /home/lars/dev/audiobookshelf/audiobooks/x/y/out.m4a
[2023-01-04 22:23:26] DEBUG: [Watcher] Modified file in library "test" and folder "fol_n3de763tzli5ez0qv9" with relPath "/x/y/out.m4a"
[2023-01-04 22:23:30]  INFO: [Server] 1 Files Changed
[2023-01-04 22:23:30] DEBUG: [Scanner] Scanning file update groups in folder "fol_n3de763tzli5ez0qv9" of library "test"
[2023-01-04 22:23:30] DEBUG: [Scanner] scanFolderUpdates fileUpdateGroup { 'x/y': [ 'out.m4a' ] }
[2023-01-04 22:23:30] DEBUG: [Scanner] Folder update group must be a new item "x/y" in library "test"
[2023-01-04 22:23:30] DEBUG: [Scanner] Scanning new item "/home/lars/dev/audiobookshelf/audiobooks/x/y"
[2023-01-04 22:23:30] ERROR: [MediaFileScanner] Invalid data found when processing input : "/home/lars/dev/audiobookshelf/audiobooks/x/y/out.m4a"
[2023-01-04 22:23:30]  WARN: [Scanner] Library item has no media files "/home/lars/dev/audiobookshelf/audiobooks/x/y"
[2023-01-04 22:23:30] DEBUG: [Scanner] Folder scan results { 'x/y': 0 }

Even if you update the file by copying the remaining data; either let the process finish or cancel it and run:

❯ cp out.m4a audiobooks/x/y/out.m4a

ABS will do nothing. The file will never be imported and the audiobook will be missing in the library, despite valid files being now available in the library directory.

But even if the metadata are already available, this leads to problems. As I mentioned above, I found that a lot of books have incorrect file sizes listed.

Let's generate a second file, this time using the -movflags faststart parameter:

❯ ffmpeg -f lavfi -i sine=f=220:b=4 -t 2:00:00 -c:a aac -ar 48k -b:a 128k -movflags faststart out-faststart.m4a

Now running ffprobe on the first part of the file will no longer fail:

❯ dd if=out-faststart.m4a of=part-faststart.m4a bs=10M count=10
❯ ffprobe part-faststart.m4a 
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'part-faststart.m4a':
  Duration: 02:00:00.00, start: 0.000000, bitrate: 116 kb/s
  Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 128 kb/s (default)

That's great, let's copy the file:

❯ mkdir -p audiobooks/y/z
❯ cat out-faststart.m4a | pv -L 100K > audiobooks/y/z/out-faststart.m4a

Again, wait a minute, hit Ctrl + z, then resume using fg if you like.
ABS will pick up the file again, but this time it will be imported:

[2023-01-04 22:37:03] DEBUG: [Watcher] File Added /home/lars/dev/audiobookshelf/audiobooks/y/z/out-faststart.m4a
[2023-01-04 22:37:03] DEBUG: [Watcher] Modified file in library "test" and folder "fol_n3de763tzli5ez0qv9" with relPath "/y/z/out-faststart.m4a"
[2023-01-04 22:37:07]  INFO: [Server] 1 Files Changed
[2023-01-04 22:37:07] DEBUG: [Scanner] Scanning file update groups in folder "fol_n3de763tzli5ez0qv9" of library "test"
[2023-01-04 22:37:07] DEBUG: [Scanner] scanFolderUpdates fileUpdateGroup { 'y/z': [ 'out-faststart.m4a' ] }
[2023-01-04 22:37:07] DEBUG: [Scanner] Folder update group must be a new item "y/z" in library "test"
[2023-01-04 22:37:07] DEBUG: [Scanner] Scanning new item "/home/lars/dev/audiobookshelf/audiobooks/y/z"
[2023-01-04 22:37:08] DEBUG: Library Item "/home/lars/dev/audiobookshelf/audiobooks/y/z" Media file scan took 107ms with 1 audio files averaging 107ms per MB
[2023-01-04 22:37:08] DEBUG: [Book] Tracks being rebuilt...!
[2023-01-04 22:37:08] DEBUG: [DB] Inserted 1 author
[2023-01-04 22:37:08] DEBUG: [LibraryItem] Success saving abmetadata to "/home/lars/dev/audiobookshelf/metadata/items/li_qavld3d21qbmcjuuua/metadata.abs"
[2023-01-04 22:37:08] DEBUG: [DB] Library Items inserted 1
[2023-01-04 22:37:08] DEBUG: [Scanner] Folder scan results { 'y/z': 1 }

Unfortunately, the listed file size in the client is 6.15 MB while the actual file size is about 112 MB.

How to fix

The scanner should allow for a file to not be modified for a few seconds before trying to pick it up. This would likely already catch a number of the problematic imports.

However, this would likely only reduce the number of errors, not get rid of them. To completely fix the issue, ABS should keep a hash alongside the file name of imported file and re-import the file automatically if it is modified again.

Workaround

I used scp for my first batch but switched to using rsync later on. Fortunately rsync copies files to a temporary file first which will not be picked up by ABS since it starts with a . and then move the file into place. This circumvents the problem. But using SMB, scp, FTP, … should all show this problem. Not for every file, but it will pop up eventually.

Audiobookshelf version

2.x

Originally created by @lkiesow on GitHub (Jan 4, 2023). ### Describe the issue When I started using Audiobookshelf I've noticed that several of my imported books were not imported. Going through the logs, I found many errors related to imports which essentially boiled down to a failure to execute `ffprobe` successfully. There are several other reports of something like this, like e.g. #1036, which suggest re-encoding the files to fix the metadata. But I found that just renaming the files would already fix the problem. Later I also found that a lot of books which were successfully imported had an incorrect file size listed. My suspicion was that the scanner would pick up and try to process files which had only been half-transferred to the server file system. Investigating the issue further, that seems to be true. ### Steps to reproduce the issue To see what happens, let's first generate an audio file. Not that this does not use the FFmpeg flag `faststart`. This essentially means that metadata will be at the end of the file in this mp4 container: ``` ❯ ffmpeg -f lavfi -i sine=f=220:b=4 -t 2:00:00 -c:a aac -ar 48k -b:a 128k out.m4a ``` To verify that `ffprobe` fails when run on a half transferred file, we can just copy the first 100 MB to a new file and run `ffprobe` on that file: ``` ❯ dd if=out.m4a of=part.m4a bs=10M count=10 ❯ ffprobe part.m4a [mov,mp4,m4a,3gp,3g2,mj2 @ 0x563170cc2ac0] moov atom not found part.m4a: Invalid data found when processing input ``` This is exactly what I expected to happen. Next, let's see how Audiobookshelf handles these this. I used `pv` to simulate a slow network file transfer of 100 KB/s while copying the audio file to a library directory: ``` ❯ mkdir -p audiobooks/x/y ❯ cat out.m4a | pv -L 100K > audiobooks/x/y/out.m4a ``` Let this run for a few minutes, then hit `Ctrl + z` to suspend this for a few minutes to simulate a slight network glich before resuming the process by running: ``` ❯ fg ``` The copy process continues, but Audiobookshelf starts processing the file. Since the metadata is still missing from the file, `ffprobe` will fail and that means the import process will fail: ``` [2023-01-04 22:23:26] DEBUG: [Watcher] File Added /home/lars/dev/audiobookshelf/audiobooks/x/y/out.m4a [2023-01-04 22:23:26] DEBUG: [Watcher] Modified file in library "test" and folder "fol_n3de763tzli5ez0qv9" with relPath "/x/y/out.m4a" [2023-01-04 22:23:30] INFO: [Server] 1 Files Changed [2023-01-04 22:23:30] DEBUG: [Scanner] Scanning file update groups in folder "fol_n3de763tzli5ez0qv9" of library "test" [2023-01-04 22:23:30] DEBUG: [Scanner] scanFolderUpdates fileUpdateGroup { 'x/y': [ 'out.m4a' ] } [2023-01-04 22:23:30] DEBUG: [Scanner] Folder update group must be a new item "x/y" in library "test" [2023-01-04 22:23:30] DEBUG: [Scanner] Scanning new item "/home/lars/dev/audiobookshelf/audiobooks/x/y" [2023-01-04 22:23:30] ERROR: [MediaFileScanner] Invalid data found when processing input : "/home/lars/dev/audiobookshelf/audiobooks/x/y/out.m4a" [2023-01-04 22:23:30] WARN: [Scanner] Library item has no media files "/home/lars/dev/audiobookshelf/audiobooks/x/y" [2023-01-04 22:23:30] DEBUG: [Scanner] Folder scan results { 'x/y': 0 } ``` Even if you update the file by copying the remaining data; either let the process finish or cancel it and run: ``` ❯ cp out.m4a audiobooks/x/y/out.m4a ``` ABS will do nothing. The file will never be imported and the audiobook will be missing in the library, despite valid files being now available in the library directory. But even if the metadata are already available, this leads to problems. As I mentioned above, I found that a lot of books have incorrect file sizes listed. Let's generate a second file, this time using the `-movflags faststart` parameter: ``` ❯ ffmpeg -f lavfi -i sine=f=220:b=4 -t 2:00:00 -c:a aac -ar 48k -b:a 128k -movflags faststart out-faststart.m4a ``` Now running `ffprobe` on the first part of the file will no longer fail: ``` ❯ dd if=out-faststart.m4a of=part-faststart.m4a bs=10M count=10 ❯ ffprobe part-faststart.m4a Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'part-faststart.m4a': Duration: 02:00:00.00, start: 0.000000, bitrate: 116 kb/s Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 128 kb/s (default) ``` That's great, let's copy the file: ``` ❯ mkdir -p audiobooks/y/z ❯ cat out-faststart.m4a | pv -L 100K > audiobooks/y/z/out-faststart.m4a ``` Again, wait a minute, hit `Ctrl + z`, then resume using `fg` if you like. ABS will pick up the file again, but this time it will be imported: ``` [2023-01-04 22:37:03] DEBUG: [Watcher] File Added /home/lars/dev/audiobookshelf/audiobooks/y/z/out-faststart.m4a [2023-01-04 22:37:03] DEBUG: [Watcher] Modified file in library "test" and folder "fol_n3de763tzli5ez0qv9" with relPath "/y/z/out-faststart.m4a" [2023-01-04 22:37:07] INFO: [Server] 1 Files Changed [2023-01-04 22:37:07] DEBUG: [Scanner] Scanning file update groups in folder "fol_n3de763tzli5ez0qv9" of library "test" [2023-01-04 22:37:07] DEBUG: [Scanner] scanFolderUpdates fileUpdateGroup { 'y/z': [ 'out-faststart.m4a' ] } [2023-01-04 22:37:07] DEBUG: [Scanner] Folder update group must be a new item "y/z" in library "test" [2023-01-04 22:37:07] DEBUG: [Scanner] Scanning new item "/home/lars/dev/audiobookshelf/audiobooks/y/z" [2023-01-04 22:37:08] DEBUG: Library Item "/home/lars/dev/audiobookshelf/audiobooks/y/z" Media file scan took 107ms with 1 audio files averaging 107ms per MB [2023-01-04 22:37:08] DEBUG: [Book] Tracks being rebuilt...! [2023-01-04 22:37:08] DEBUG: [DB] Inserted 1 author [2023-01-04 22:37:08] DEBUG: [LibraryItem] Success saving abmetadata to "/home/lars/dev/audiobookshelf/metadata/items/li_qavld3d21qbmcjuuua/metadata.abs" [2023-01-04 22:37:08] DEBUG: [DB] Library Items inserted 1 [2023-01-04 22:37:08] DEBUG: [Scanner] Folder scan results { 'y/z': 1 } ``` Unfortunately, the listed file size in the client is 6.15 MB while the actual file size is about 112 MB. ### How to fix The scanner should allow for a file to not be modified for a few seconds before trying to pick it up. This would likely already catch a number of the problematic imports. However, this would likely only reduce the number of errors, not get rid of them. To completely fix the issue, ABS should keep a hash alongside the file name of imported file and re-import the file automatically if it is modified again. ### Workaround I used `scp` for my first batch but switched to using `rsync` later on. Fortunately `rsync` copies files to a temporary file first which will not be picked up by ABS since it starts with a `.` and then move the file into place. This circumvents the problem. But using SMB, scp, FTP, … should all show this problem. Not for every file, but it will pop up eventually. ### Audiobookshelf version 2.x
adam added the bug label 2026-04-24 23:24:36 +02:00
adam closed this issue 2026-04-24 23:24:36 +02:00
Author
Owner

@advplyr commented on GitHub (Jan 4, 2023):

Thanks for tracking this down and finding a way to reproduce. Abs is using Watcher to listen for new/removed files in the folders.

I'll have to dig into that lib again to see when it is sending the "add" event for a new file but I did a quick skim of the configurable options and it didn't look like there is one for this case.

The library can also emit events for changes to the file but I had that disabled because it was emitting events for every metadata change (which includes viewing the file) (see). If we choose to continue using this library we can enable that again and just build out a more robust check on what of the file is has changed.

There is an update on this library I've been meaning to check on and incorporate. The first step is probably to re-evaluate this library.

@advplyr commented on GitHub (Jan 4, 2023): Thanks for tracking this down and finding a way to reproduce. Abs is using [Watcher](https://github.com/fabiospampinato/watcher) to listen for new/removed files in the folders. I'll have to dig into that lib again to see when it is sending the "add" event for a new file but I did a quick skim of the configurable options and it didn't look like there is one for this case. The library can also emit events for changes to the file but I had that disabled because it was emitting events for every metadata change (which includes viewing the file) ([see](https://github.com/advplyr/audiobookshelf/blob/master/server/Watcher.js#L46)). If we choose to continue using this library we can enable that again and just build out a more robust check on what of the file is has changed. There is an update on this library I've been meaning to check on and incorporate. The first step is probably to re-evaluate this library.
Author
Owner

@lkiesow commented on GitHub (Jan 5, 2023):

The library can also emit events for changes to the file but I had that disabled because it was emitting events for every metadata change (which includes viewing the file)

I could see that event getting triggered if you run “Edit Book” → “Tools” → “Embed Metadata”, but why does ABS modify the file when you view (play?) the file?

@lkiesow commented on GitHub (Jan 5, 2023): > The library can also emit events for changes to the file but I had that disabled because it was emitting events for every metadata change (which includes viewing the file) I could see that event getting triggered if you run “Edit Book” → “Tools” → “Embed Metadata”, but why does ABS modify the file when you view (play?) the file?
Author
Owner

@advplyr commented on GitHub (Jan 6, 2023):

Abs doesn't modify the file. I'm talking about the Watcher library can fire an event that the file changed just by viewing the file. This is because the file metadata stores the last time it was accessed.

I was just mentioning that as one reason why I disabled that event listener from Watcher. I hadn't built out a system of detecting what has changed on the file and I didn't want that firing every time the file is accessed.

@advplyr commented on GitHub (Jan 6, 2023): Abs doesn't modify the file. I'm talking about the Watcher library can fire an event that the file changed just by viewing the file. This is because the file metadata stores the last time it was accessed. I was just mentioning that as one reason why I disabled that event listener from Watcher. I hadn't built out a system of detecting what has changed on the file and I didn't want that firing every time the file is accessed.
Author
Owner

@JohanPotgieter commented on GitHub (Oct 1, 2023):

I have the same issue as described in #2010. When uploading books via ABS it's not added to the library automatically. You have to scan the library to see the new addition.

@JohanPotgieter commented on GitHub (Oct 1, 2023): I have the same issue as described in #2010. When uploading books via ABS it's not added to the library automatically. You have to scan the library to see the new addition.
Author
Owner

@advplyr commented on GitHub (Oct 1, 2023):

@JohanPotgieter Currently the upload feature relies on the watcher to scan in the uploaded files. If you are using a network file system there is a good chance the watcher will not work for you.

@advplyr commented on GitHub (Oct 1, 2023): @JohanPotgieter Currently the upload feature relies on the watcher to scan in the uploaded files. If you are using a network file system there is a good chance the watcher will not work for you.
Author
Owner

@kymahi commented on GitHub (Oct 25, 2023):

@advplyr When you say "network file system" do you mean a network-connected storage drive accessed by the server? Because I'm seeing this issue on a USB-connected storage drive as well. This also is a newer issue, as I didn't have this issue until about June of this year after using this for about a year before that. Maybe an update to the Watcher library?

@kymahi commented on GitHub (Oct 25, 2023): @advplyr When you say "network file system" do you mean a network-connected storage drive accessed by the server? Because I'm seeing this issue on a USB-connected storage drive as well. This also is a newer issue, as I didn't have this issue until about June of this year after using this for about a year before that. Maybe an update to the Watcher library?
Author
Owner

@advplyr commented on GitHub (Oct 25, 2023):

@kymahi Technically this is for any audio file that takes more than 4 seconds to copy. I just tested a possible solution by checking the file modified time and waiting for the modified time to stop changing before considering it as "added".

@advplyr commented on GitHub (Oct 25, 2023): @kymahi Technically this is for any audio file that takes more than 4 seconds to copy. I just tested a possible solution by checking the file modified time and waiting for the modified time to stop changing before considering it as "added".
Author
Owner

@advplyr commented on GitHub (Oct 25, 2023):

The solution I implemented is working for me. When new files are added the mtime of the file is checked every second until it stops changing.

@advplyr commented on GitHub (Oct 25, 2023): The solution I implemented is working for me. When new files are added the `mtime` of the file is checked every second until it stops changing.
Author
Owner

@advplyr commented on GitHub (Oct 29, 2023):

Fixed in v2.5.0

@advplyr commented on GitHub (Oct 29, 2023): Fixed in [v2.5.0](https://github.com/advplyr/audiobookshelf/releases/tag/v2.5.0)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/audiobookshelf#866