mirror of
https://github.com/advplyr/audiobookshelf.git
synced 2026-05-30 23:40:40 +02:00
[Enhancement]: Add Whisper support #1106
Open
opened 2026-04-24 23:32:41 +02:00 by adam
·
14 comments
No Branch/Tag Specified
master
book_tags_genres_dedupe
episode_download_fallback
Issue-4540-SortBy-StartedDate-and-FinishedDate
episode_meta_tagging
fix_authorize_race_condition
redirect_transcode_requests
progress_updated_sort
fix_ereader_socket_event
fix_change_empty_root_password
fix_podcast_session_track_index
fix_set_token
session_modal_user
localize_durations
fix_oidc_create_user
jwt_auth_refactor
fix_scanner_deleting_single_file_books
fix_mediaprogress_updatedat_2
experimental_next_client
podcast_episode_duration
episode-timestamps-clickable
book_author_secondary_sort_title
podcast_useragents
pathexists_user_access
fix_pathexists_join
book_author_secondary_sort
clean_duplicate_mediaprogress
sanitize_html_description
trix_prevent_attachments
check_path_api_fix
fix_mediaprogress_updatedat
increase_express_json_limit
fix_dockerfile_nunicode
search_episodes
audiobook_tools_update
episode_secondary_sorts
hls_stream_url_update
new_session_track_endpoint
audiobook_tools_enhancements
watcher_rescans_update
player_track_tooltip
fix_exclude_prefixes_crash
socket_item_events
fix_podcast_episode_scanner_promise
new_stats_controller
count_cache_for_userpermissions
parsing-opf-v3
validate_migration_files
fix-quick-match-all-crash
fix-chapter-end-sleep-timer
stringify_sequelize_query
remove-col-ambiguity
fix_next_prev_edit_description
details_trim_whitespace
fix_content_url_basepath
fix_logger_fatal
progress_bar_visibility
batch-edit-populate-map-details
feed_generator_updates
bookmark-modal-updates
migrate-library-item-in-scanner
migrate-new-library-items
migrate-podcasts-new-library-item-2
migrate-podcasts-new-library-item
fix-remove-episode-from-playlist
playback-session-use-new-library-item
refactor-library-item
fix-heatmap-caption
feed-episodes-upsert
share-media-player-media-session-api
remove-old-playlist
remove_old_collection_object
plugin-implementation-demo
feed_migration
refactor-feeds-from-item
fix_remove_authors_no_books
v2.17.3-fk-constraints-migration
migrations-first-upgrade
sqlite_2
feature/nuxt-target-server
waveform
sqlite
playlists
video
v2.35.1
v2.35.0
v2.34.0
v2.33.2
v2.33.1
v2.33.0
v2.32.1
v2.32.0
v2.31.0
v2.30.0
v2.29.0
v2.28.0
v2.27.0
v2.26.3
v2.26.2
v2.26.1
v2.26.0
v2.25.1
v2.25.0
v2.24.0
v2.23.0
v2.22.0
v2.21.0
v2.20.0
v2.19.5
v2.19.4
v2.19.3
v2.19.2
v2.19.1
v2.19.0
v2.18.1
v2.18.0
v2.17.7
v2.17.6
v2.17.5
v2.17.4
v2.17.3
v2.17.2
v2.17.1
v2.17.0
v2.16.2
v2.16.1
v2.16.0
v2.15.1
v2.15.0
v2.14.0
v2.13.4
v2.13.3
v2.13.2
v2.13.1
v2.13.0
v2.12.3
v2.12.2
v2.12.1
v2.12.0
v2.11.0
v2.10.1
v2.10.0
v2.9.0
v2.8.1
v2.8.0
v2.7.2
v2.7.1
v2.7.0
v2.6.0
v2.5.0
v2.4.4
v2.4.3
v2.4.2
v2.4.1
v2.4.0
v2.3.5
v2.3.4
v2.3.3
v2.3.2
v2.3.1
v2.3.0
v2.2.23
v2.2.22
v2.2.21
v2.2.20
v2.2.19
v2.2.18
v2.2.17
v2.2.16
v2.2.15
v2.2.14
v2.2.13
v2.2.12
v2.2.11
v2.2.10
v2.2.9
v2.2.8
v2.2.7
v2.2.6
v2.2.5
v2.2.4
v2.2.3
v2.2.2
v2.2.1
v2.2.0
v2.1.5
v2.1.4
v2.1.3
v2.1.2
v2.1.1
v2.1.0
v2.0.24
v2.0.23
v2.0.22
v2.0.21
v2.0.20
v2.0.19
v2.0.18
v2.0.17
v2.0.16
v2.0.15
v2.0.14
v2.0.13
v2.0.12
v2.0.11
v2.0.10
v2.0.9
v2.0.8
v2.0.7
v2.0.6
v2.0.5
v2.0.4
v2.0.3
v2.0.2
v2.0.1
v1.7.2
v1.7.1
v1.7.0
v1.6.0
v1.5.5
v1.5.0
v1.4.11
v1.4.9
v1.4.7
v1.4.6
v1.4.4
v1.4.2
v1.4.0
v1.4.1
v1.3.4
v1.3.3
v1.3.1
v1.2.8
v1.2.6
v1.2.5
v1.2.4
v1.2.1
v1.1.15
v1.1.14
v1.1.13
v1.1.12
v1.1.11
v1.1.10
v1.1.9
v1.1.8
v1.0.0
0.9.61-beta.0
0.9.61-beta
Labels
Clear labels
authentication
backlog
bug
chapter editor
config-issue
ebooks
encoding/embedding
enhancement
help wanted
listening sessions & progress
planned
possible plugin
progress sync
pull-request
sorting/filtering/searching
unable to reproduce
upload
users & permissions
waiting
Mirrored from GitHub Pull Request
No Label
enhancement
Milestone
No items
No Milestone
Projects
Clear projects
No project
Assignees
adam (Adam Melkus)
Clear assignees
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: starred/audiobookshelf#1106
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @apiweb on GitHub (Apr 25, 2023).
Describe the feature/enhancement
Hi there!
I've been using AudioBookShelf for a while now, and I love the platform. I was thinking about how it could be improved, and I had an idea that I wanted to share with you all.
I think it would be great if AudioBookShelf could integrate with Whisper speech-to-text model to automatically generate subtitles for audiobooks. This could be an external tool like Tone and ffmpeg that the user could enable or disable as needed.
With Whisper, it would be possible to transcribe speech across dozens of languages and even handle poor audio quality or excessive background noise. It would make it easier for people who are hard of hearing or have difficulty understanding accents to enjoy audiobooks.
Here are some tips on how to integrate this feature into the AudioBookShelf flow:
I hope you will consider this suggestion for future updates to AudioBookShelf. Let me know if you have any questions or concerns.
Thank you for all your hard work making AudioBookShelf a great platform!
@advplyr commented on GitHub (Apr 27, 2023):
Maybe we first add support for an
srtsubtitle file.The opposite of this was also requested for ebooks #601
@tehguitarist commented on GitHub (May 4, 2023):
I guess a natural extension (and this WOULD make audiobookshelf an Audible killer) would be a whispersync like syncing with ebooks. I guess that would just be shifting the bookmark position of the ebook/audiobook whenever one or the other is progressed.
Not a lot of work specifically, but some from a quick google: https://github.com/readbeyond/aeneas/ and https://github.com/r4victor/syncabook.
Though any library that can match up audio based on a text file, obviously there's a bunch of work to find the start of the chapter, and the start of the audiobook and match that up but that's more of a pipe dream. Aens probably shows the most reasonable promise, but obvious difference between the formats (any preamble by narrators, or table of contents with ebooks) would all be factors. Without digging into the libraries, as long as there was enough error handling to wait til both files had matches (and skip over extras in one or the other) that may go quite smoothly.
@advplyr commented on GitHub (May 4, 2023):
There is also this issue https://github.com/advplyr/audiobookshelf/issues/189
@damajor commented on GitHub (May 5, 2023):
I tested Whisper on my setup and results are kind of good but far from perfect.
Using base model it took me around 4 minutes to transcribe 1 hour audiobook.
Using large model it took me a bit more than 1 hour to transcribe 1 hour audiobook.
Those tests were done on 5950X with 12 parallel threads (no GPU involved).
@turnercore commented on GitHub (Sep 13, 2023):
I think Whisper (or some kind of speach-to-text) integration could be really nice to be able to transcribe audiobooks if people wanted subtitles. It would be a nice accessibility feature for someone listening in a second language, for example.
@Ed1ks commented on GitHub (Oct 31, 2023):
👍thumbs up. Would be very nice for self recorded audiofiles.
@rounakdatta commented on GitHub (Nov 2, 2023):
Should we consider working on this? I can volunteer to start contributing.
Currently Snipd is the only app which gets AI-generated subtitles UX perfectly right, and this could be our chance to make Audiobookshelf the FOSS alternative to it.
@advplyr commented on GitHub (Nov 2, 2023):
Yeah I'm interested in this but I can't put much attention towards it now. We were talking about it in Discord the other day. If anyone wants to start putting something together or setup a proof of concept that would be great. We can chat about it in Discord
@yuchen-lea commented on GitHub (Nov 21, 2023):
@advplyr I agree with what you said about adding support for SRT subtitle files first. I have now used Whisper to generate corresponding subtitles for my local podcasts. On my computer, I can search and view them. Displaying subtitles while playing on the ABS mobile app is the final piece of the jigsaw.
I think these two things can share the same UI: LRC files #817 (external LRC files with the same name as the audio file or ID3 information embedded in the audio file) and SRT files #2257 (external SRT files with the same name as the audio file).
@iamhenry commented on GitHub (Apr 19, 2024):
this is exactly what would be great with ABS. I have the paid feature for Snipd and now it's hard to take notes without it using audiobooks.
@turnercore commented on GitHub (Apr 20, 2024):
I agree, and it isn't hard to generate the .srt files from audio now. Maybe there should be a branch to work on this, I'd propose doing it in this order:
Honestly as a further extension I would LOVE if you could do audio-clips like Snipd that could export to Obsidian or something, but I think having the ability and UI set up for transcriptions would be the first hurdle for that. We could add audio clips on back button like Snipd does after that.
@zkvsky commented on GitHub (Apr 27, 2024):
just FYI there are several implementations of whisper specifically tailored to subtitle generation. This one for example https://github.com/jianfch/stable-ts can not only generate srt, but also ssa/ass karaoke style subs [meaning that the current spoken word is highlighted] bringing us even closer to snipd. From my experience base and small models are enough with it.
@turnercore commented on GitHub (May 1, 2024):
Thanks for the heads up, hopefully I can get some time to work on a PR for this type of thing. I haven't contributed yet though so I imagine it will take me a bit to get familiar with the code base and what needs to be updated for this kind of feature.
@iamhenry commented on GitHub (May 2, 2024):
@turnercore that would be awesome!