[PR #5036] [MERGED] Improved subtitle parsing to account for bare colon in title #4398

Closed
opened 2026-04-25 00:19:36 +02:00 by adam · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/advplyr/audiobookshelf/pull/5036
Author: @kctdfh
Created: 2/6/2026
Status: Merged
Merged: 3/8/2026
Merged by: @advplyr

Base: masterHead: Slightly-better-subtitle-parsing-logic


📝 Commits (1)

  • c15cb48 Improved subtitle parsing to account for bare colon in title

📊 Changes

4 files changed (+22 additions, -5 deletions)

View changed files

📝 server/finders/BookFinder.js (+4 -4)
📝 server/utils/parsers/parseNfoMetadata.js (+1 -1)
📝 test/server/finders/BookFinder.test.js (+3 -0)
📝 test/server/utils/parsers/parseNfoMetadata.test.js (+14 -0)

📄 Description

Brief summary

Improved subtitle parsing so that only : (colon followed by a space) is treated as a title/subtitle separator. Previously, a bare : was matched, which incorrectly split titles like 10:04 into title "10" and subtitle "04".

Which issue is fixed?

This is an edge case that I encountered in my own server and decided to just fix it. I haven't seen any reports of this issue from other users.

In-depth Description

The subtitle parsing logic previously used a bare colon (:) to detect and split title/subtitle boundaries. This works for the vast majority of books (e.g. "Everything Is Miscellaneous: The Power of the New Digital Disorder"). However, this splitting logic fails for titles that contain a colon as part of the name itself (e.g. 10:04 by Ben Lerner, or something like "Making the Mission:Impossible Movies", which isn't a real book but serves as a good example).

This PR changes all subtitle-splitting logic to require a space after the colon (": ") before treating it as a separator. This is consistent with standard conventions for title/subtitle formatting and would also not trigger incorrect splits for titles with bare colons. Note that the subtitle-splitting logic of getSubtitle(folder) in scandir.js is not changed because there we depend on " - " only.

Lastly, I've kept the focus of this PR narrow to just accounting for the space after the colon in subtitle parsing, but there are some related improvements that could be made in the future. For example, accounting for the rare case where a combined book title + subtitle might have more than one colon. See this book which could be received as "Mission: Impossible Code Name: Judas" or this book which is actually officially called "Humanoid Encounters: 1995-1999: The Others amongst Us". These are really narrow edge cases but there could be other cases of multiple colons in title + subtitle out there because apparently Kindle Direct Publishing doesn't ensure that the book title doesn't include a colon (see this discussion). I propose no fix for these cases and don't really have an opinion either. I think the current heuristic (i.e. everything after ": " is the subtitle) is good enough.

How have you tested this?

  • All existing + new unit tests pass.
  • Verified that standard colon-space subtitles (e.g., "Anna Karenina: subtitle", "The Great Gatsby: A Novel") are still correctly split.
  • Verified that bare-colon titles (e.g., "10:04", "Making the Mission:Impossible Movies") are no longer incorrectly split.
  • Verified the changes in the web client by passing the Title in the nfo file and setting Title and Subtitle to null in the other active metadata priorities/sources) so that they come from the nfo file parser.

Screenshots

Before
1004_before

After
1004_after


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/advplyr/audiobookshelf/pull/5036 **Author:** [@kctdfh](https://github.com/kctdfh) **Created:** 2/6/2026 **Status:** ✅ Merged **Merged:** 3/8/2026 **Merged by:** [@advplyr](https://github.com/advplyr) **Base:** `master` ← **Head:** `Slightly-better-subtitle-parsing-logic` --- ### 📝 Commits (1) - [`c15cb48`](https://github.com/advplyr/audiobookshelf/commit/c15cb48def796006807ff105945f22829f82bab8) Improved subtitle parsing to account for bare colon in title ### 📊 Changes **4 files changed** (+22 additions, -5 deletions) <details> <summary>View changed files</summary> 📝 `server/finders/BookFinder.js` (+4 -4) 📝 `server/utils/parsers/parseNfoMetadata.js` (+1 -1) 📝 `test/server/finders/BookFinder.test.js` (+3 -0) 📝 `test/server/utils/parsers/parseNfoMetadata.test.js` (+14 -0) </details> ### 📄 Description ## Brief summary Improved subtitle parsing so that only `: ` (colon followed by a space) is treated as a title/subtitle separator. Previously, a bare `:` was matched, which incorrectly split titles like [10:04](https://www.goodreads.com/book/show/20613582-10) into title "10" and subtitle "04". ## Which issue is fixed? This is an edge case that I encountered in my own server and decided to just fix it. I haven't seen any reports of this issue from other users. ## In-depth Description The subtitle parsing logic previously used a bare colon (`:`) to detect and split title/subtitle boundaries. This works for the vast majority of books (e.g. "Everything Is Miscellaneous: The Power of the New Digital Disorder"). However, this splitting logic fails for titles that contain a colon as part of the name itself (e.g. [10:04](https://www.goodreads.com/book/show/20613582-10) by Ben Lerner, or something like "Making the Mission:Impossible Movies", which isn't a real book but serves as a good example). This PR changes all subtitle-splitting logic to require a space after the colon (`": "`) before treating it as a separator. This is consistent with standard conventions for title/subtitle formatting and would also not trigger incorrect splits for titles with bare colons. Note that the subtitle-splitting logic of `getSubtitle(folder)` in `scandir.js` is not changed because there we depend on " - " only. Lastly, I've kept the focus of this PR narrow to just accounting for the space after the colon in subtitle parsing, but there are some related improvements that could be made in the future. For example, accounting for the rare case where a combined book title + subtitle might have more than one colon. See [this book](https://www.goodreads.com/book/show/6654264-code-name) which could be received as "Mission: Impossible Code Name: Judas" or [this book](https://www.goodreads.com/book/show/27879233-humanoid-encounters) which is actually officially called "Humanoid Encounters: 1995-1999: The Others amongst Us". These are really narrow edge cases but there could be other cases of multiple colons in title + subtitle out there because apparently Kindle Direct Publishing doesn't ensure that the book title doesn't include a colon (see [this discussion](https://www.kdpcommunity.com/s/question/0D52T00005FHSzMSAX/book-title-has-2-colons-before-subtitle-on-amazon-sales-page?language=en_US)). I propose no fix for these cases and don't really have an opinion either. I think the current heuristic (i.e. everything after ": " is the subtitle) is good enough. ## How have you tested this? - All existing + new unit tests pass. - Verified that standard colon-space subtitles (e.g., "Anna Karenina: subtitle", "The Great Gatsby: A Novel") are still correctly split. - Verified that bare-colon titles (e.g., "10:04", "Making the Mission:Impossible Movies") are no longer incorrectly split. - Verified the changes in the web client by passing the Title in the nfo file and setting Title and Subtitle to null in the other active metadata priorities/sources) so that they come from the nfo file parser. ## Screenshots Before ![1004_before](https://github.com/user-attachments/assets/c465577b-79b4-47c9-976f-c9085fcd8808) After ![1004_after](https://github.com/user-attachments/assets/9613f5b8-6b98-4148-b213-f4078c435f2b) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
adam added the pull-request label 2026-04-25 00:19:36 +02:00
adam closed this issue 2026-04-25 00:19:36 +02:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/audiobookshelf#4398