[PR #5119] feat: Auto generate chapters for podcasts that provide timestamps #4433

Open
opened 2026-04-25 00:19:44 +02:00 by adam · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/advplyr/audiobookshelf/pull/5119
Author: @harryr0se
Created: 3/11/2026
Status: 🔄 Open

Base: masterHead: auto-generate-chapters-from-timestamps


📝 Commits (10+)

  • e8d65ce Commit first implementation of timestamp to chapter generation
  • b4b126e Add chapter title scraping and improve error logging
  • e096a04 Revert .devcontainer/devcontainer.json
  • 256c341 Update updating of end values to use new chaptersToPush temp array
  • bb7fcc1 Only use projects logger
  • 9d4a2a8 Improve chapter generation code and extract it into its own function
  • b3ba764 Add tests
  • bccf946 Merge branch 'advplyr:master' into auto-generate-chapters-from-timestamps
  • 32ea3e0 Update logging to use info for key logs, also use [PodcastEpisode] prefix to match other logs
  • 1e19bf3 Merge branch 'auto-generate-chapters-from-timestamps' of https://github.com/harryr0se/audiobookshelf into auto-generate-chapters-from-timestamps

📊 Changes

3 files changed (+291 additions, -0 deletions)

View changed files

📝 server/models/PodcastEpisode.js (+13 -0)
server/utils/parsers/parsePodcastDescriptionForChapters.js (+112 -0)
test/server/utils/parsers/parsePodcastDescriptionForChapters.test.js (+166 -0)

📄 Description

Brief summary

This PR adds support for the automatic generation of chapters when a podcast episode provides timestamps in the description, it does this by scraping the description line by line and building up a chapter list

Which issue is fixed?

I started working on this as it's something I really wanted, but I've found the following related issue:
https://github.com/advplyr/audiobookshelf/issues/2363

In-depth Description

If the newly added autoGenerateChapters field is true on the Podcast object, the generation code will run when ABS creates a PodcastEpisode object from a newly downloaded RSSPocastEpisode

The generation steps:

  1. Break up the description into lines, currently it splits on any of the following </p>, <br /> or \n
  2. Iterating each line we look for a timestamp via regex
  3. If we match, we try to work out if the timestamp contains an hour or not, it's common for descriptions to only start including hours when they tick over the hour mark, for example
• 00:00 Chapter 1
• 30:00 Chapter 2
• 1:04:14 Chapter 3
  1. We then calculate the chapter start time in seconds based upon this timestamp
  2. Extracting the title is a matter of a further regex which attempts to find text after the timestamp
  3. If there are other chapters that have been generated then we update the last ones end value to be this new chapters start, this makes the assumption that timestamps will be sequential and contiguous
  4. Once out of the loop we update the last chapter to end at the duration of the audio file

Error checking

I believe that this sort of feature should be quite conservative and if there are instances where we would be unsure of the state of a given timestamp we should bail out of the entire process for the podcast episode. This is particularly important due to the fact we're treating them as neighboring chapters, so errors could propagate

This implementation currently has the following error handling:

  1. Throwing on basic argument null checks
  2. Throwing if we're unable to scrape the title of a given chapter
  3. Throw if we scrape and are only able to find one chapter (perhaps this isn't required, but one chapter seems unhelpful and I felt it could indicate some parsing failure)
  4. Throw if there's timestamps past the end of the audio file
  5. Throw if there's minutes or seconds over 59

How have you tested this?

I have added a new test suite for this scraping code, I've tried to cover a number of success and failure cases
All of the above checks if "error checking" should be captured by tests

I've also been running my fork with this for nearly a week and it's working well on the 3 podcasts I subscribe to which provide timestamps

Screenshots

Web interface

image image

iOS app beta

image

Next steps

I wanted to open this PR to start a discussion with maintainers and get feedback.

I'm aware that there's a re-write of the front end ongoing, so I've tried to craft this PR the something that could land server side and then be included in the new UI. In the meantime it could be enabled on a per podcast basis via the api
It would be nice to know if that would be something you'd be open to


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/advplyr/audiobookshelf/pull/5119 **Author:** [@harryr0se](https://github.com/harryr0se) **Created:** 3/11/2026 **Status:** 🔄 Open **Base:** `master` ← **Head:** `auto-generate-chapters-from-timestamps` --- ### 📝 Commits (10+) - [`e8d65ce`](https://github.com/advplyr/audiobookshelf/commit/e8d65ceb88725e6a35ced5dfae7bffcc4967b3f5) Commit first implementation of timestamp to chapter generation - [`b4b126e`](https://github.com/advplyr/audiobookshelf/commit/b4b126e39f084d69a0bc585c2004bae7b3a3020b) Add chapter title scraping and improve error logging - [`e096a04`](https://github.com/advplyr/audiobookshelf/commit/e096a046039e868571b94d0015a1427599e461ec) Revert .devcontainer/devcontainer.json - [`256c341`](https://github.com/advplyr/audiobookshelf/commit/256c341f06597c7c8ea4bc33de72cf31b8fd13cf) Update updating of end values to use new chaptersToPush temp array - [`bb7fcc1`](https://github.com/advplyr/audiobookshelf/commit/bb7fcc1420c3ba77723b23574ea3867d660bfae6) Only use projects logger - [`9d4a2a8`](https://github.com/advplyr/audiobookshelf/commit/9d4a2a8a598c500c0a221d82e0e5e983a3f3914b) Improve chapter generation code and extract it into its own function - [`b3ba764`](https://github.com/advplyr/audiobookshelf/commit/b3ba764d11dbf3619efe33ac8abd1d2807a3adbb) Add tests - [`bccf946`](https://github.com/advplyr/audiobookshelf/commit/bccf94689dd86c10d6c3f4ac14aef2be98b41b81) Merge branch 'advplyr:master' into auto-generate-chapters-from-timestamps - [`32ea3e0`](https://github.com/advplyr/audiobookshelf/commit/32ea3e08d69887ac70877f202a42823b87e5c737) Update logging to use info for key logs, also use [PodcastEpisode] prefix to match other logs - [`1e19bf3`](https://github.com/advplyr/audiobookshelf/commit/1e19bf303146d05cb13811366b4a6951d1477808) Merge branch 'auto-generate-chapters-from-timestamps' of https://github.com/harryr0se/audiobookshelf into auto-generate-chapters-from-timestamps ### 📊 Changes **3 files changed** (+291 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `server/models/PodcastEpisode.js` (+13 -0) ➕ `server/utils/parsers/parsePodcastDescriptionForChapters.js` (+112 -0) ➕ `test/server/utils/parsers/parsePodcastDescriptionForChapters.test.js` (+166 -0) </details> ### 📄 Description <!-- For Work In Progress Pull Requests, please use the Draft PR feature, see https://github.blog/2019-02-14-introducing-draft-pull-requests/ for further details. If you do not follow this template, the PR may be closed without review. Please ensure all checks pass. If you are a new contributor, the workflows will need to be manually approved before they run. --> ## Brief summary <!-- Please provide a brief summary of what your PR attempts to achieve. --> This PR adds support for the automatic generation of chapters when a podcast episode provides timestamps in the description, it does this by scraping the description line by line and building up a chapter list ## Which issue is fixed? <!-- Which issue number does this PR fix? Ex: "Fixes #1234" --> I started working on this as it's something I really wanted, but I've found the following related issue: https://github.com/advplyr/audiobookshelf/issues/2363 ## In-depth Description <!-- Describe your solution in more depth. How does it work? Why is this the best solution? Does it solve a problem that affects multiple users or is this an edge case for your setup? --> If the newly added `autoGenerateChapters` field is true on the `Podcast` object, the generation code will run when ABS creates a `PodcastEpisode` object from a newly downloaded `RSSPocastEpisode` The generation steps: 1. Break up the description into lines, currently it splits on any of the following `</p>`, `<br />` or `\n` 1. Iterating each line we look for a timestamp via regex 1. If we match, we try to work out if the timestamp contains an hour or not, it's common for descriptions to only start including hours when they tick over the hour mark, for example ``` • 00:00 Chapter 1 • 30:00 Chapter 2 • 1:04:14 Chapter 3 ``` 4. We then calculate the chapter start time in seconds based upon this timestamp 1. Extracting the title is a matter of a further regex which attempts to find text after the timestamp 1. If there are other chapters that have been generated then we update the last ones `end` value to be this new chapters `start`, this makes the assumption that timestamps will be sequential and contiguous 1. Once out of the loop we update the last chapter to end at the duration of the audio file ### Error checking I believe that this sort of feature should be quite conservative and if there are instances where we would be unsure of the state of a given timestamp we should bail out of the entire process for the podcast episode. This is particularly important due to the fact we're treating them as neighboring chapters, so errors could propagate This implementation currently has the following error handling: 1. Throwing on basic argument `null` checks 2. Throwing if we're unable to scrape the title of a given chapter 3. Throw if we scrape and are only able to find one chapter (perhaps this isn't required, but one chapter seems unhelpful and I felt it could indicate some parsing failure) 4. Throw if there's timestamps past the end of the audio file 5. Throw if there's minutes or seconds over 59 ## How have you tested this? <!-- Please describe in detail with reproducible steps how you tested your changes. --> I have added a new test suite for this scraping code, I've tried to cover a number of success and failure cases All of the above checks if "error checking" should be captured by tests I've also been running my fork with this for nearly a week and it's working well on the 3 podcasts I subscribe to which provide timestamps ## Screenshots <!-- If your PR includes any changes to the web client, please include screenshots or a short video from before and after your changes. --> ### Web interface <img width="834" height="545" alt="image" src="https://github.com/user-attachments/assets/959d61fc-e083-47c5-ad0b-f6351a20a19c" /> <img width="669" height="469" alt="image" src="https://github.com/user-attachments/assets/2c264e70-badf-4ae2-8c0d-7a3727e5021c" /> ### iOS app beta <img width="523" height="1080" alt="image" src="https://github.com/user-attachments/assets/4a77d2a2-4985-494c-a22f-8266bda627ec" /> ### Next steps I wanted to open this PR to start a discussion with maintainers and get feedback. I'm aware that there's a re-write of the front end ongoing, so I've tried to craft this PR the something that could land server side and then be included in the new UI. In the meantime it could be enabled on a per podcast basis via the api It would be nice to know if that would be something you'd be open to --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
adam added the pull-request label 2026-04-25 00:19:44 +02:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/audiobookshelf#4433