[Enhancement]: Add Whisper support #1106

New Issue

2026-04-24T23:32:41+02:00

adam commented

2026-04-24 23:32:41 +02:00

Originally created by @apiweb on GitHub (Apr 25, 2023).

Describe the feature/enhancement

Hi there!

I've been using AudioBookShelf for a while now, and I love the platform. I was thinking about how it could be improved, and I had an idea that I wanted to share with you all.

I think it would be great if AudioBookShelf could integrate with Whisper speech-to-text model to automatically generate subtitles for audiobooks. This could be an external tool like Tone and ffmpeg that the user could enable or disable as needed.

With Whisper, it would be possible to transcribe speech across dozens of languages and even handle poor audio quality or excessive background noise. It would make it easier for people who are hard of hearing or have difficulty understanding accents to enjoy audiobooks.

Here are some tips on how to integrate this feature into the AudioBookShelf flow:

Use the metadata language tag to automatically set the Whisper language.
Automatically save the srt using the Title Folder Naming structure.

I hope you will consider this suggestion for future updates to AudioBookShelf. Let me know if you have any questions or concerns.

Thank you for all your hard work making AudioBookShelf a great platform!

Originally created by @apiweb on GitHub (Apr 25, 2023). ### Describe the feature/enhancement Hi there! I've been using AudioBookShelf for a while now, and I love the platform. I was thinking about how it could be improved, and I had an idea that I wanted to share with you all. I think it would be great if AudioBookShelf could integrate with [Whisper](https://github.com/openai/whisper) speech-to-text model to automatically generate subtitles for audiobooks. This could be an external tool like Tone and ffmpeg that the user could enable or disable as needed. With Whisper, it would be possible to transcribe speech across dozens of languages and even handle poor audio quality or excessive background noise. It would make it easier for people who are hard of hearing or have difficulty understanding accents to enjoy audiobooks. Here are some tips on how to integrate this feature into the AudioBookShelf flow: - Use the metadata language tag to automatically set the Whisper language. - Automatically save the srt using the [Title Folder Naming](https://www.audiobookshelf.org/docs#book-title-folder-naming) structure. I hope you will consider this suggestion for future updates to AudioBookShelf. Let me know if you have any questions or concerns. Thank you for all your hard work making AudioBookShelf a great platform!

adam added the enhancement label 2026-04-24 23:32:41 +02:00

adam commented

2026-04-24 23:32:42 +02:00

@advplyr commented on GitHub (Apr 27, 2023):

Maybe we first add support for an srt subtitle file.

The opposite of this was also requested for ebooks #601

@advplyr commented on GitHub (Apr 27, 2023): Maybe we first add support for an `srt` subtitle file. The opposite of this was also requested for ebooks #601

adam commented

2026-04-24 23:32:42 +02:00

@tehguitarist commented on GitHub (May 4, 2023):

I guess a natural extension (and this WOULD make audiobookshelf an Audible killer) would be a whispersync like syncing with ebooks. I guess that would just be shifting the bookmark position of the ebook/audiobook whenever one or the other is progressed.

Not a lot of work specifically, but some from a quick google: https://github.com/readbeyond/aeneas/ and https://github.com/r4victor/syncabook.

Though any library that can match up audio based on a text file, obviously there's a bunch of work to find the start of the chapter, and the start of the audiobook and match that up but that's more of a pipe dream. Aens probably shows the most reasonable promise, but obvious difference between the formats (any preamble by narrators, or table of contents with ebooks) would all be factors. Without digging into the libraries, as long as there was enough error handling to wait til both files had matches (and skip over extras in one or the other) that may go quite smoothly.

@tehguitarist commented on GitHub (May 4, 2023): I guess a natural extension (and this WOULD make audiobookshelf an Audible killer) would be a whispersync like syncing with ebooks. I guess that would just be shifting the bookmark position of the ebook/audiobook whenever one or the other is progressed. Not a lot of work specifically, but some from a quick google: https://github.com/readbeyond/aeneas/ and https://github.com/r4victor/syncabook. Though any library that can match up audio based on a text file, obviously there's a bunch of work to find the start of the chapter, and the start of the audiobook and match that up but that's more of a pipe dream. Aens probably shows the most reasonable promise, but obvious difference between the formats (any preamble by narrators, or table of contents with ebooks) would all be factors. Without digging into the libraries, as long as there was enough error handling to wait til both files had matches (and skip over extras in one or the other) that may go quite smoothly.

adam commented

2026-04-24 23:32:43 +02:00

@advplyr commented on GitHub (May 4, 2023):

There is also this issue https://github.com/advplyr/audiobookshelf/issues/189

@advplyr commented on GitHub (May 4, 2023): There is also this issue https://github.com/advplyr/audiobookshelf/issues/189

adam commented

2026-04-24 23:32:45 +02:00

@damajor commented on GitHub (May 5, 2023):

I tested Whisper on my setup and results are kind of good but far from perfect.

Using base model it took me around 4 minutes to transcribe 1 hour audiobook.
Using large model it took me a bit more than 1 hour to transcribe 1 hour audiobook.

Those tests were done on 5950X with 12 parallel threads (no GPU involved).

@damajor commented on GitHub (May 5, 2023): I tested Whisper on my setup and results are kind of good but far from perfect. Using base model it took me around 4 minutes to transcribe 1 hour audiobook. Using large model it took me a bit more than 1 hour to transcribe 1 hour audiobook. Those tests were done on 5950X with 12 parallel threads (no GPU involved).

adam commented

2026-04-24 23:32:45 +02:00

@turnercore commented on GitHub (Sep 13, 2023):

I think Whisper (or some kind of speach-to-text) integration could be really nice to be able to transcribe audiobooks if people wanted subtitles. It would be a nice accessibility feature for someone listening in a second language, for example.

@turnercore commented on GitHub (Sep 13, 2023): I think Whisper (or some kind of speach-to-text) integration could be really nice to be able to transcribe audiobooks if people wanted subtitles. It would be a nice accessibility feature for someone listening in a second language, for example.

adam commented

2026-04-24 23:32:47 +02:00

@Ed1ks commented on GitHub (Oct 31, 2023):

👍thumbs up. Would be very nice for self recorded audiofiles.

@Ed1ks commented on GitHub (Oct 31, 2023): 👍thumbs up. Would be very nice for self recorded audiofiles.

adam commented

2026-04-24 23:32:48 +02:00

@rounakdatta commented on GitHub (Nov 2, 2023):

Should we consider working on this? I can volunteer to start contributing.
Currently Snipd is the only app which gets AI-generated subtitles UX perfectly right, and this could be our chance to make Audiobookshelf the FOSS alternative to it.

@rounakdatta commented on GitHub (Nov 2, 2023): Should we consider working on this? I can volunteer to start contributing. Currently [Snipd](https://www.snipd.com/) is the only app which gets AI-generated subtitles UX perfectly right, and this could be our chance to make Audiobookshelf the FOSS alternative to it.

adam commented

2026-04-24 23:32:50 +02:00

@advplyr commented on GitHub (Nov 2, 2023):

Yeah I'm interested in this but I can't put much attention towards it now. We were talking about it in Discord the other day. If anyone wants to start putting something together or setup a proof of concept that would be great. We can chat about it in Discord

@advplyr commented on GitHub (Nov 2, 2023): Yeah I'm interested in this but I can't put much attention towards it now. We were talking about it in Discord the other day. If anyone wants to start putting something together or setup a proof of concept that would be great. We can chat about it in Discord

adam commented

2026-04-24 23:32:53 +02:00

@yuchen-lea commented on GitHub (Nov 21, 2023):

Maybe we first add support for an srt subtitle file.

The opposite of this was also requested for ebooks #601

@advplyr I agree with what you said about adding support for SRT subtitle files first. I have now used Whisper to generate corresponding subtitles for my local podcasts. On my computer, I can search and view them. Displaying subtitles while playing on the ABS mobile app is the final piece of the jigsaw.

I think these two things can share the same UI: LRC files #817 (external LRC files with the same name as the audio file or ID3 information embedded in the audio file) and SRT files #2257 (external SRT files with the same name as the audio file).

@yuchen-lea commented on GitHub (Nov 21, 2023): > Maybe we first add support for an `srt` subtitle file. > > The opposite of this was also requested for ebooks #601 @advplyr I agree with what you said about adding support for SRT subtitle files first. I have now used Whisper to generate corresponding subtitles for my local podcasts. On my computer, I can search and view them. Displaying subtitles while playing on the ABS mobile app is the final piece of the jigsaw. I think these two things can share the same UI: LRC files #817 (external LRC files with the same name as the audio file or ID3 information embedded in the audio file) and SRT files #2257 (external SRT files with the same name as the audio file).

adam commented

2026-04-24 23:32:54 +02:00

@iamhenry commented on GitHub (Apr 19, 2024):

Should we consider working on this? I can volunteer to start contributing. Currently Snipd is the only app which gets AI-generated subtitles UX perfectly right, and this could be our chance to make Audiobookshelf the FOSS alternative to it.

this is exactly what would be great with ABS. I have the paid feature for Snipd and now it's hard to take notes without it using audiobooks.

@iamhenry commented on GitHub (Apr 19, 2024): > Should we consider working on this? I can volunteer to start contributing. Currently [Snipd](https://www.snipd.com/) is the only app which gets AI-generated subtitles UX perfectly right, and this could be our chance to make Audiobookshelf the FOSS alternative to it. this is exactly what would be great with ABS. I have the paid feature for Snipd and now it's hard to take notes without it using audiobooks.

adam commented

2026-04-24 23:32:55 +02:00

@turnercore commented on GitHub (Apr 20, 2024):

I agree, and it isn't hard to generate the .srt files from audio now. Maybe there should be a branch to work on this, I'd propose doing it in this order:

Getting ABS to recognize .srt files next to audio files and displaying that in the UI in some way, like Snipd (I agree they do a great job with the UI).
Adding .srt upload option for the files in the UI
Creating a function that can use a whisper url to transcribe the files automatically if set up
Adding settings for whisper api url and options to auto-transcribe new files & transcribe button

Honestly as a further extension I would LOVE if you could do audio-clips like Snipd that could export to Obsidian or something, but I think having the ability and UI set up for transcriptions would be the first hurdle for that. We could add audio clips on back button like Snipd does after that.

@turnercore commented on GitHub (Apr 20, 2024): I agree, and it isn't hard to generate the .srt files from audio now. Maybe there should be a branch to work on this, I'd propose doing it in this order: 1. Getting ABS to recognize .srt files next to audio files and displaying that in the UI in some way, like Snipd (I agree they do a great job with the UI). 2. Adding .srt upload option for the files in the UI 3. Creating a function that can use a whisper url to transcribe the files automatically if set up 4. Adding settings for whisper api url and options to auto-transcribe new files & transcribe button Honestly as a further extension I would LOVE if you could do audio-clips like Snipd that could export to Obsidian or something, but I think having the ability and UI set up for transcriptions would be the first hurdle for that. We could add audio clips on back button like Snipd does after that.

adam commented

2026-04-24 23:32:58 +02:00

@zkvsky commented on GitHub (Apr 27, 2024):

just FYI there are several implementations of whisper specifically tailored to subtitle generation. This one for example https://github.com/jianfch/stable-ts can not only generate srt, but also ssa/ass karaoke style subs [meaning that the current spoken word is highlighted] bringing us even closer to snipd. From my experience base and small models are enough with it.

@zkvsky commented on GitHub (Apr 27, 2024): just FYI there are several implementations of whisper specifically tailored to subtitle generation. This one for example https://github.com/jianfch/stable-ts can not only generate srt, but also ssa/ass karaoke style subs [meaning that the current spoken word is highlighted] bringing us even closer to snipd. From my experience base and small models are enough with it.

adam commented

2026-04-24 23:32:58 +02:00

@turnercore commented on GitHub (May 1, 2024):

just FYI there are several implementations of whisper specifically tailored to subtitle generation. This one for example https://github.com/jianfch/stable-ts can not only generate srt, but also ssa/ass karaoke style subs [meaning that the current spoken word is highlighted] bringing us even closer to snipd. From my experience base and small models are enough with it.

Thanks for the heads up, hopefully I can get some time to work on a PR for this type of thing. I haven't contributed yet though so I imagine it will take me a bit to get familiar with the code base and what needs to be updated for this kind of feature.

@turnercore commented on GitHub (May 1, 2024): > just FYI there are several implementations of whisper specifically tailored to subtitle generation. This one for example https://github.com/jianfch/stable-ts can not only generate srt, but also ssa/ass karaoke style subs [meaning that the current spoken word is highlighted] bringing us even closer to snipd. From my experience base and small models are enough with it. Thanks for the heads up, hopefully I can get some time to work on a PR for this type of thing. I haven't contributed yet though so I imagine it will take me a bit to get familiar with the code base and what needs to be updated for this kind of feature.

adam commented

2026-04-24 23:32:59 +02:00

@iamhenry commented on GitHub (May 2, 2024):

@turnercore that would be awesome!

@iamhenry commented on GitHub (May 2, 2024): @turnercore that would be awesome!

adam referenced this issue

2026-04-25 00:15:41 +02:00

[PR #1106] [MERGED] Makes the dev target support auto reloading of the server #3445

Sign in to join this conversation.

Branches Tags

master

book_tags_genres_dedupe

episode_download_fallback

Issue-4540-SortBy-StartedDate-and-FinishedDate

episode_meta_tagging

fix_authorize_race_condition

redirect_transcode_requests

progress_updated_sort

fix_ereader_socket_event

fix_change_empty_root_password

fix_podcast_session_track_index

fix_set_token

session_modal_user

localize_durations

fix_oidc_create_user

jwt_auth_refactor

fix_scanner_deleting_single_file_books

fix_mediaprogress_updatedat_2

experimental_next_client

podcast_episode_duration

episode-timestamps-clickable

book_author_secondary_sort_title

podcast_useragents

pathexists_user_access

fix_pathexists_join

book_author_secondary_sort

clean_duplicate_mediaprogress

sanitize_html_description

trix_prevent_attachments

check_path_api_fix

fix_mediaprogress_updatedat

increase_express_json_limit

fix_dockerfile_nunicode

search_episodes

audiobook_tools_update

episode_secondary_sorts

hls_stream_url_update

new_session_track_endpoint

audiobook_tools_enhancements

watcher_rescans_update

player_track_tooltip

fix_exclude_prefixes_crash

socket_item_events

fix_podcast_episode_scanner_promise

new_stats_controller

count_cache_for_userpermissions

parsing-opf-v3

validate_migration_files

fix-quick-match-all-crash

fix-chapter-end-sleep-timer

stringify_sequelize_query

remove-col-ambiguity

fix_next_prev_edit_description

details_trim_whitespace

fix_content_url_basepath

fix_logger_fatal

progress_bar_visibility

batch-edit-populate-map-details

feed_generator_updates

bookmark-modal-updates

migrate-library-item-in-scanner

migrate-new-library-items

migrate-podcasts-new-library-item-2

migrate-podcasts-new-library-item

fix-remove-episode-from-playlist

playback-session-use-new-library-item

refactor-library-item

fix-heatmap-caption

feed-episodes-upsert

share-media-player-media-session-api

remove-old-playlist

remove_old_collection_object

plugin-implementation-demo

feed_migration

refactor-feeds-from-item

fix_remove_authors_no_books

v2.17.3-fk-constraints-migration

migrations-first-upgrade

sqlite_2

feature/nuxt-target-server

waveform

sqlite

playlists

video

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/audiobookshelf#1106