[Enhancement]: generate images from text and show them while playing the book #1863

Closed
opened 2026-04-25 00:00:35 +02:00 by adam · 0 comments
Owner

Originally created by @dadino on GitHub (Apr 5, 2024).

Describe the feature/enhancement

This is a crazy idea and we're probably a few years away from being able to do it on our local machines. Treat this request more as a conversation.

When something like this is implemented to generate subtitles from the audio files, we could generate images for each page/minute/chapter/scene/whatever and show them in the player when listening to an audiobook.

When a audiobook is imported and his subtitles are generated, queue image generation using a predefined prompt (something like "an illustration for a book of this scene: %s") that could also be changed by the user (maybe I want a specific style for the illustration).

Since image generation prompts are usually short, I guess we should give all the previous (read) text to an LLM and ask it to "create a short prompt of the current scene/chapter to use in an image generator model, from this text: %s".

The file structure would be pretty simple, a folder with images and a text file with timestamps (like an srt). This could spark a community generated library of "book illustrations". It could also mean that ABS could just be a consumer of this format, not the creator.

Why?

I listen to audiobooks in small chunk of 5-15 minutes and having some context when I start a session would be great.
I also love cool images and having illustration for spaceship battles while listening to Expeditionary Forces, changing with every battle, switching to some tacticool Ruhar soldier when boots hit the ground or some crazy plan takes place on an alien planet, would throw me in book right away.

The Reddit post that sparked this idea for me

Originally created by @dadino on GitHub (Apr 5, 2024). ### Describe the feature/enhancement This is a crazy idea and we're probably a few years away from being able to do it on our local machines. Treat this request more as a conversation. When [something like](https://github.com/advplyr/audiobookshelf/issues/1723) this is implemented to generate subtitles from the audio files, we could generate images for each page/minute/chapter/scene/whatever and show them in the player when listening to an audiobook. When a audiobook is imported and his subtitles are generated, queue image generation using a predefined prompt (something like "an illustration for a book of this scene: %s") that could also be changed by the user (maybe I want a specific style for the illustration). Since image generation prompts are usually short, I guess we should give all the previous (read) text to an LLM and ask it to "create a short prompt of the current scene/chapter to use in an image generator model, from this text: %s". The file structure would be pretty simple, a folder with images and a text file with timestamps (like an srt). This could spark a community generated library of "book illustrations". It could also mean that ABS could just be a consumer of this format, not the creator. ### Why? I listen to audiobooks in small chunk of 5-15 minutes and having some **context** when I start a session would be great. I also love **cool images** and having illustration for spaceship battles while listening to Expeditionary Forces, changing with every battle, switching to some tacticool Ruhar soldier when boots hit the ground or some crazy plan takes place on an alien planet, would throw me in book right away. [The Reddit post that sparked this idea for me](https://www.reddit.com/r/StableDiffusion/comments/1bppt3e/ok_guys_this_is_the_future_of_reading_ebook_llm_sd/)
adam added the enhancement label 2026-04-25 00:00:35 +02:00
adam closed this issue 2026-04-25 00:00:35 +02:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/audiobookshelf#1863