[Bug]: Audioboks with multiple cds are not properly added #530

Closed
opened 2026-04-24 23:12:10 +02:00 by adam · 2 comments
Owner

Originally created by @mediadev123 on GitHub (Jul 16, 2022).

Describe the issue

One audiobook has cd's subfolders as already described in https://github.com/advplyr/audiobookshelf/issues/393

What's special about my case is that the CDs seem to have a space in between them ("CD 1" instead of "CD1"), which causes the algorithm to not correctly label them. Grabbing through the code, it seems like the regex is all over the place and not really updated:

Here are lines where the incorrect regex is used:
https://github.com/advplyr/audiobookshelf/blob/d0af1c3c9ac49ae46d490a87d7f67754111e16f6/server/utils/scandir.js#L60
https://github.com/advplyr/audiobookshelf/blob/d0af1c3c9ac49ae46d490a87d7f67754111e16f6/server/utils/scandir.js#L111

Here an updated regex is used:
https://github.com/advplyr/audiobookshelf/blob/5446aea910b4d51c5320236421aa9f668b65bdc4/server/scanner/AudioFileScanner.js#L25
https://github.com/advplyr/audiobookshelf/blob/5446aea910b4d51c5320236421aa9f668b65bdc4/server/scanner/AudioFileScanner.js#L32
https://github.com/advplyr/audiobookshelf/blob/8894f5243972e10e2b400dd7c3bf338ca8867068/server/scanner/MediaFileScanner.js#L26https://github.com/advplyr/audiobookshelf/blob/8894f5243972e10e2b400dd7c3bf338ca8867068/server/scanner/MediaFileScanner.js#L33

But a few lines below another incorrect one is used again:
https://github.com/advplyr/audiobookshelf/blob/5446aea910b4d51c5320236421aa9f668b65bdc4/server/scanner/AudioFileScanner.js#L38 (no check for disc)
https://github.com/advplyr/audiobookshelf/blob/8894f5243972e10e2b400dd7c3bf338ca8867068/server/scanner/MediaFileScanner.js#L38-L39

I would highly suggest to create a constant for the regex and unify it to something like this:

const CDSubfolderRegexes = [/\bdis[ck]\s*(?<num>\d{1,3})\b/i, /\bcd\s*(?<num>\d{1,3})\b/i]

Those regex would catch the following formats: "CD(any space any more times)(any number, from 0 to 999)", "disk(any space any more times)(any number, from 0 to 999)", "disc(any space any more times)(any number, from 0 to 999)". By using \s instead of a space, we allow any other weird space characters (like the non-line breaking, a tab or even a new line -- you never know). This allows multiple spaces (in case someone misnames the CD). This allows the alternate spelling of disc, disk. And by using a named group, the regex is more forward compatible in case there are more groups in the future:

For example:

var discFromFolder = match(t).groups["num"]
// Instead of 
var discFromFolder = Number(pathdir.replace(/cd/i, ''))

Lastly, it would be great if this regex is configurable so that users can add their own version easily, in case their audiobook uses a regional name for disc, such as disco in Spanish (I think). This is why I used a regex array instead of a more complex regex -- they scale way better IMO.

I would like to help but my javascript is not good enough for doing a refactor.

Steps to reproduce the issue

  1. Use any audiobook with a space in between the "CD " and number

Audiobookshelf version

2.0.24_amd64.deb

How are you running audiobookshelf?

Debian/PPA

Originally created by @mediadev123 on GitHub (Jul 16, 2022). ### Describe the issue One audiobook has cd's subfolders as already described in https://github.com/advplyr/audiobookshelf/issues/393 What's special about my case is that the CDs seem to have a space in between them ("CD 1" instead of "CD1"), which causes the algorithm to not correctly label them. Grabbing through the code, it seems like the regex is all over the place and not really updated: Here are lines where the incorrect regex is used: https://github.com/advplyr/audiobookshelf/blob/d0af1c3c9ac49ae46d490a87d7f67754111e16f6/server/utils/scandir.js#L60 https://github.com/advplyr/audiobookshelf/blob/d0af1c3c9ac49ae46d490a87d7f67754111e16f6/server/utils/scandir.js#L111 Here an updated regex is used: https://github.com/advplyr/audiobookshelf/blob/5446aea910b4d51c5320236421aa9f668b65bdc4/server/scanner/AudioFileScanner.js#L25 https://github.com/advplyr/audiobookshelf/blob/5446aea910b4d51c5320236421aa9f668b65bdc4/server/scanner/AudioFileScanner.js#L32 https://github.com/advplyr/audiobookshelf/blob/8894f5243972e10e2b400dd7c3bf338ca8867068/server/scanner/MediaFileScanner.js#L26https://github.com/advplyr/audiobookshelf/blob/8894f5243972e10e2b400dd7c3bf338ca8867068/server/scanner/MediaFileScanner.js#L33 But a few lines below another incorrect one is used again: https://github.com/advplyr/audiobookshelf/blob/5446aea910b4d51c5320236421aa9f668b65bdc4/server/scanner/AudioFileScanner.js#L38 (no check for disc) https://github.com/advplyr/audiobookshelf/blob/8894f5243972e10e2b400dd7c3bf338ca8867068/server/scanner/MediaFileScanner.js#L38-L39 I would highly suggest to create a constant for the regex and unify it to something like this: ``` const CDSubfolderRegexes = [/\bdis[ck]\s*(?<num>\d{1,3})\b/i, /\bcd\s*(?<num>\d{1,3})\b/i] ``` Those regex would catch the following formats: "CD(any space any more times)(any number, from 0 to 999)", "disk(any space any more times)(any number, from 0 to 999)", "disc(any space any more times)(any number, from 0 to 999)". By using `\s` instead of a space, we allow any other weird space characters (like the non-line breaking, a tab or even a new line -- you never know). This allows multiple spaces (in case someone misnames the CD). This allows the alternate spelling of disc, disk. And by using a named group, the regex is more forward compatible in case there are more groups in the future: For example: ``` var discFromFolder = match(t).groups["num"] // Instead of var discFromFolder = Number(pathdir.replace(/cd/i, '')) ``` Lastly, it would be great if this regex is configurable so that users can add their own version easily, in case their audiobook uses a regional name for disc, such as disco in Spanish (I think). This is why I used a regex array instead of a more complex regex -- they scale way better IMO. I would like to help but my javascript is not good enough for doing a refactor. ### Steps to reproduce the issue 1. Use any audiobook with a space in between the "CD " and number ### Audiobookshelf version 2.0.24_amd64.deb ### How are you running audiobookshelf? Debian/PPA
adam added the bug label 2026-04-24 23:12:10 +02:00
adam closed this issue 2026-04-24 23:12:11 +02:00
Author
Owner

@advplyr commented on GitHub (Jul 23, 2022):

Can you share what your exact file path was that was not properly detected? Spaces are supported already when parsing the CD from the audio file name.

There is some confusion in your post about what the various regexes are doing for CD. One thing to note is the AudioFileScanner was removed and replaced with the MediaFileScanner but the code parsing the CD is the same.

The MediaFileScanner will attempt to parse out a track and/or cd number from the filename. The only purpose of this is to properly order the tracks so for example Track 10 CD 1 would go before Track 1 CD 2.
Example audio filenames supported
audiobook CD01.mp3
audiobook CD 01.mp3
audiobook Disc 1.mp3
audiobook Disc10.mp3

In order to support another file structure sometimes used, I added support for detecting CD subfolders.
For example
/Animal Farm/CD01/track1.mp3
/Animal Farm/CD2/track1.mp3

This is more restrictive in that it doesn't allow a space but that could be updated. I don't think this is what you are asking for though.

I would like to add support for customizing the scanner folder structure but there is a lot of planning that has to go into that since there are so many edge cases. It has been brought up a few times and I like the suggestion to use grok-style syntax for defining path patterns #774

@advplyr commented on GitHub (Jul 23, 2022): Can you share what your exact file path was that was not properly detected? Spaces are supported already when parsing the CD from the audio file name. There is some confusion in your post about what the various regexes are doing for CD. One thing to note is the AudioFileScanner was removed and replaced with the MediaFileScanner but the code parsing the CD is the same. The MediaFileScanner will attempt to parse out a track and/or cd number from the filename. The only purpose of this is to properly order the tracks so for example Track 10 CD 1 would go before Track 1 CD 2. Example audio filenames supported `audiobook CD01.mp3` `audiobook CD 01.mp3` `audiobook Disc 1.mp3` `audiobook Disc10.mp3` In order to support another file structure sometimes used, I added support for detecting CD subfolders. For example `/Animal Farm/CD01/track1.mp3` `/Animal Farm/CD2/track1.mp3` This is more restrictive in that it doesn't allow a space but that could be updated. I don't think this is what you are asking for though. I would like to add support for customizing the scanner folder structure but there is a lot of planning that has to go into that since there are so many edge cases. It has been brought up a few times and I like the suggestion to use grok-style syntax for defining path patterns #774
Author
Owner

@mediadev123 commented on GitHub (Aug 22, 2022):

Hello, sorry for taking so long to respond. I must have missed your notification. The complete paths are: /mnt/media/audiobooks/AuthorFirstName AuthorLastName Collection/AuthorLastName, AuthorFirstName - AudiobookTitle/CD {1..13}/01 - Track 1.mp3

You have already pointed out the difference between MediaFileScanner and AudioFileScanner, but what exactly does scandir.js do then? Anyhow, I still think the regex should be more flexible while you work on #774 as it seems to be more complex.

Thanks for your detailed answers, though!

@mediadev123 commented on GitHub (Aug 22, 2022): Hello, sorry for taking so long to respond. I must have missed your notification. The complete paths are: `/mnt/media/audiobooks/AuthorFirstName AuthorLastName Collection/AuthorLastName, AuthorFirstName - AudiobookTitle/CD {1..13}/01 - Track 1.mp3` You have already pointed out the difference between `MediaFileScanner` and `AudioFileScanner`, but what exactly does scandir.js do then? Anyhow, I still think the regex should be more flexible while you work on #774 as it seems to be more complex. Thanks for your detailed answers, though!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/audiobookshelf#530