[PR #5119] feat: Auto generate chapters for podcasts that provide timestamps #4433

New Issue

2026-04-25T00:19:44+02:00

adam commented

2026-04-25 00:19:44 +02:00

📋 Pull Request Information

Original PR: https://github.com/advplyr/audiobookshelf/pull/5119
Author: @harryr0se
Created: 3/11/2026
Status: 🔄 Open

Base: master ← Head: auto-generate-chapters-from-timestamps

📝 Commits (10+)

e8d65ce Commit first implementation of timestamp to chapter generation
b4b126e Add chapter title scraping and improve error logging
e096a04 Revert .devcontainer/devcontainer.json
256c341 Update updating of end values to use new chaptersToPush temp array
bb7fcc1 Only use projects logger
9d4a2a8 Improve chapter generation code and extract it into its own function
b3ba764 Add tests
bccf946 Merge branch 'advplyr:master' into auto-generate-chapters-from-timestamps
32ea3e0 Update logging to use info for key logs, also use [PodcastEpisode] prefix to match other logs
1e19bf3 Merge branch 'auto-generate-chapters-from-timestamps' of https://github.com/harryr0se/audiobookshelf into auto-generate-chapters-from-timestamps

📊 Changes

3 files changed (+291 additions, -0 deletions)

View changed files

📝 server/models/PodcastEpisode.js (+13 -0)
➕ server/utils/parsers/parsePodcastDescriptionForChapters.js (+112 -0)
➕ test/server/utils/parsers/parsePodcastDescriptionForChapters.test.js (+166 -0)

📄 Description

Brief summary

This PR adds support for the automatic generation of chapters when a podcast episode provides timestamps in the description, it does this by scraping the description line by line and building up a chapter list

Which issue is fixed?

I started working on this as it's something I really wanted, but I've found the following related issue:
https://github.com/advplyr/audiobookshelf/issues/2363

In-depth Description

If the newly added autoGenerateChapters field is true on the Podcast object, the generation code will run when ABS creates a PodcastEpisode object from a newly downloaded RSSPocastEpisode

The generation steps:

Break up the description into lines, currently it splits on any of the following ,   or \n
Iterating each line we look for a timestamp via regex
If we match, we try to work out if the timestamp contains an hour or not, it's common for descriptions to only start including hours when they tick over the hour mark, for example

• 00:00 Chapter 1
• 30:00 Chapter 2
• 1:04:14 Chapter 3

We then calculate the chapter start time in seconds based upon this timestamp
Extracting the title is a matter of a further regex which attempts to find text after the timestamp
If there are other chapters that have been generated then we update the last ones end value to be this new chapters start, this makes the assumption that timestamps will be sequential and contiguous
Once out of the loop we update the last chapter to end at the duration of the audio file

Error checking

I believe that this sort of feature should be quite conservative and if there are instances where we would be unsure of the state of a given timestamp we should bail out of the entire process for the podcast episode. This is particularly important due to the fact we're treating them as neighboring chapters, so errors could propagate

This implementation currently has the following error handling:

Throwing on basic argument null checks
Throwing if we're unable to scrape the title of a given chapter
Throw if we scrape and are only able to find one chapter (perhaps this isn't required, but one chapter seems unhelpful and I felt it could indicate some parsing failure)
Throw if there's timestamps past the end of the audio file
Throw if there's minutes or seconds over 59

How have you tested this?

I have added a new test suite for this scraping code, I've tried to cover a number of success and failure cases
All of the above checks if "error checking" should be captured by tests

I've also been running my fork with this for nearly a week and it's working well on the 3 podcasts I subscribe to which provide timestamps

Screenshots

Web interface

iOS app beta

Next steps

I wanted to open this PR to start a discussion with maintainers and get feedback.

I'm aware that there's a re-write of the front end ongoing, so I've tried to craft this PR the something that could land server side and then be included in the new UI. In the meantime it could be enabled on a per podcast basis via the api
It would be nice to know if that would be something you'd be open to

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/advplyr/audiobookshelf/pull/5119 **Author:** [@harryr0se](https://github.com/harryr0se) **Created:** 3/11/2026 **Status:** 🔄 Open **Base:** `master` ← **Head:** `auto-generate-chapters-from-timestamps` --- ### 📝 Commits (10+) - [`e8d65ce`](https://github.com/advplyr/audiobookshelf/commit/e8d65ceb88725e6a35ced5dfae7bffcc4967b3f5) Commit first implementation of timestamp to chapter generation - [`b4b126e`](https://github.com/advplyr/audiobookshelf/commit/b4b126e39f084d69a0bc585c2004bae7b3a3020b) Add chapter title scraping and improve error logging - [`e096a04`](https://github.com/advplyr/audiobookshelf/commit/e096a046039e868571b94d0015a1427599e461ec) Revert .devcontainer/devcontainer.json - [`256c341`](https://github.com/advplyr/audiobookshelf/commit/256c341f06597c7c8ea4bc33de72cf31b8fd13cf) Update updating of end values to use new chaptersToPush temp array - [`bb7fcc1`](https://github.com/advplyr/audiobookshelf/commit/bb7fcc1420c3ba77723b23574ea3867d660bfae6) Only use projects logger - [`9d4a2a8`](https://github.com/advplyr/audiobookshelf/commit/9d4a2a8a598c500c0a221d82e0e5e983a3f3914b) Improve chapter generation code and extract it into its own function - [`b3ba764`](https://github.com/advplyr/audiobookshelf/commit/b3ba764d11dbf3619efe33ac8abd1d2807a3adbb) Add tests - [`bccf946`](https://github.com/advplyr/audiobookshelf/commit/bccf94689dd86c10d6c3f4ac14aef2be98b41b81) Merge branch 'advplyr:master' into auto-generate-chapters-from-timestamps - [`32ea3e0`](https://github.com/advplyr/audiobookshelf/commit/32ea3e08d69887ac70877f202a42823b87e5c737) Update logging to use info for key logs, also use [PodcastEpisode] prefix to match other logs - [`1e19bf3`](https://github.com/advplyr/audiobookshelf/commit/1e19bf303146d05cb13811366b4a6951d1477808) Merge branch 'auto-generate-chapters-from-timestamps' of https://github.com/harryr0se/audiobookshelf into auto-generate-chapters-from-timestamps ### 📊 Changes **3 files changed** (+291 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `server/models/PodcastEpisode.js` (+13 -0) ➕ `server/utils/parsers/parsePodcastDescriptionForChapters.js` (+112 -0) ➕ `test/server/utils/parsers/parsePodcastDescriptionForChapters.test.js` (+166 -0) </details> ### 📄 Description  ## Brief summary  This PR adds support for the automatic generation of chapters when a podcast episode provides timestamps in the description, it does this by scraping the description line by line and building up a chapter list ## Which issue is fixed?  I started working on this as it's something I really wanted, but I've found the following related issue: https://github.com/advplyr/audiobookshelf/issues/2363 ## In-depth Description  If the newly added `autoGenerateChapters` field is true on the `Podcast` object, the generation code will run when ABS creates a `PodcastEpisode` object from a newly downloaded `RSSPocastEpisode` The generation steps: 1. Break up the description into lines, currently it splits on any of the following ``, ` ` or `\n` 1. Iterating each line we look for a timestamp via regex 1. If we match, we try to work out if the timestamp contains an hour or not, it's common for descriptions to only start including hours when they tick over the hour mark, for example ``` • 00:00 Chapter 1 • 30:00 Chapter 2 • 1:04:14 Chapter 3 ``` 4. We then calculate the chapter start time in seconds based upon this timestamp 1. Extracting the title is a matter of a further regex which attempts to find text after the timestamp 1. If there are other chapters that have been generated then we update the last ones `end` value to be this new chapters `start`, this makes the assumption that timestamps will be sequential and contiguous 1. Once out of the loop we update the last chapter to end at the duration of the audio file ### Error checking I believe that this sort of feature should be quite conservative and if there are instances where we would be unsure of the state of a given timestamp we should bail out of the entire process for the podcast episode. This is particularly important due to the fact we're treating them as neighboring chapters, so errors could propagate This implementation currently has the following error handling: 1. Throwing on basic argument `null` checks 2. Throwing if we're unable to scrape the title of a given chapter 3. Throw if we scrape and are only able to find one chapter (perhaps this isn't required, but one chapter seems unhelpful and I felt it could indicate some parsing failure) 4. Throw if there's timestamps past the end of the audio file 5. Throw if there's minutes or seconds over 59 ## How have you tested this?  I have added a new test suite for this scraping code, I've tried to cover a number of success and failure cases All of the above checks if "error checking" should be captured by tests I've also been running my fork with this for nearly a week and it's working well on the 3 podcasts I subscribe to which provide timestamps ## Screenshots  ### Web interface <img width="834" height="545" alt="image" src="https://github.com/user-attachments/assets/959d61fc-e083-47c5-ad0b-f6351a20a19c" /> <img width="669" height="469" alt="image" src="https://github.com/user-attachments/assets/2c264e70-badf-4ae2-8c0d-7a3727e5021c" /> ### iOS app beta <img width="523" height="1080" alt="image" src="https://github.com/user-attachments/assets/4a77d2a2-4985-494c-a22f-8266bda627ec" /> ### Next steps I wanted to open this PR to start a discussion with maintainers and get feedback. I'm aware that there's a re-write of the front end ongoing, so I've tried to craft this PR the something that could land server side and then be included in the new UI. In the meantime it could be enabled on a per podcast basis via the api It would be nice to know if that would be something you'd be open to --- 🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

adam added the pull-request label 2026-04-25 00:19:44 +02:00

Sign in to join this conversation.

Branches Tags

master

book_tags_genres_dedupe

episode_download_fallback

Issue-4540-SortBy-StartedDate-and-FinishedDate

episode_meta_tagging

fix_authorize_race_condition

redirect_transcode_requests

progress_updated_sort

fix_ereader_socket_event

fix_change_empty_root_password

fix_podcast_session_track_index

fix_set_token

session_modal_user

localize_durations

fix_oidc_create_user

jwt_auth_refactor

fix_scanner_deleting_single_file_books

fix_mediaprogress_updatedat_2

experimental_next_client

podcast_episode_duration

episode-timestamps-clickable

book_author_secondary_sort_title

podcast_useragents

pathexists_user_access

fix_pathexists_join

book_author_secondary_sort

clean_duplicate_mediaprogress

sanitize_html_description

trix_prevent_attachments

check_path_api_fix

fix_mediaprogress_updatedat

increase_express_json_limit

fix_dockerfile_nunicode

search_episodes

audiobook_tools_update

episode_secondary_sorts

hls_stream_url_update

new_session_track_endpoint

audiobook_tools_enhancements

watcher_rescans_update

player_track_tooltip

fix_exclude_prefixes_crash

socket_item_events

fix_podcast_episode_scanner_promise

new_stats_controller

count_cache_for_userpermissions

parsing-opf-v3

validate_migration_files

fix-quick-match-all-crash

fix-chapter-end-sleep-timer

stringify_sequelize_query

remove-col-ambiguity

fix_next_prev_edit_description

details_trim_whitespace

fix_content_url_basepath

fix_logger_fatal

progress_bar_visibility

batch-edit-populate-map-details

feed_generator_updates

bookmark-modal-updates

migrate-library-item-in-scanner

migrate-new-library-items

migrate-podcasts-new-library-item-2

migrate-podcasts-new-library-item

fix-remove-episode-from-playlist

playback-session-use-new-library-item

refactor-library-item

fix-heatmap-caption

feed-episodes-upsert

share-media-player-media-session-api

remove-old-playlist

remove_old_collection_object

plugin-implementation-demo

feed_migration

refactor-feeds-from-item

fix_remove_authors_no_books

v2.17.3-fk-constraints-migration

migrations-first-upgrade

sqlite_2

feature/nuxt-target-server

waveform

sqlite

playlists

video

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/audiobookshelf#4433