mirror of
https://github.com/advplyr/audiobookshelf.git
synced 2026-05-30 23:40:40 +02:00
[Bug]: Podcasts duplicate in UIs #1834
Closed
opened 2026-04-24 23:59:27 +02:00 by adam
·
16 comments
No Branch/Tag Specified
master
book_tags_genres_dedupe
episode_download_fallback
Issue-4540-SortBy-StartedDate-and-FinishedDate
episode_meta_tagging
fix_authorize_race_condition
redirect_transcode_requests
progress_updated_sort
fix_ereader_socket_event
fix_change_empty_root_password
fix_podcast_session_track_index
fix_set_token
session_modal_user
localize_durations
fix_oidc_create_user
jwt_auth_refactor
fix_scanner_deleting_single_file_books
fix_mediaprogress_updatedat_2
experimental_next_client
podcast_episode_duration
episode-timestamps-clickable
book_author_secondary_sort_title
podcast_useragents
pathexists_user_access
fix_pathexists_join
book_author_secondary_sort
clean_duplicate_mediaprogress
sanitize_html_description
trix_prevent_attachments
check_path_api_fix
fix_mediaprogress_updatedat
increase_express_json_limit
fix_dockerfile_nunicode
search_episodes
audiobook_tools_update
episode_secondary_sorts
hls_stream_url_update
new_session_track_endpoint
audiobook_tools_enhancements
watcher_rescans_update
player_track_tooltip
fix_exclude_prefixes_crash
socket_item_events
fix_podcast_episode_scanner_promise
new_stats_controller
count_cache_for_userpermissions
parsing-opf-v3
validate_migration_files
fix-quick-match-all-crash
fix-chapter-end-sleep-timer
stringify_sequelize_query
remove-col-ambiguity
fix_next_prev_edit_description
details_trim_whitespace
fix_content_url_basepath
fix_logger_fatal
progress_bar_visibility
batch-edit-populate-map-details
feed_generator_updates
bookmark-modal-updates
migrate-library-item-in-scanner
migrate-new-library-items
migrate-podcasts-new-library-item-2
migrate-podcasts-new-library-item
fix-remove-episode-from-playlist
playback-session-use-new-library-item
refactor-library-item
fix-heatmap-caption
feed-episodes-upsert
share-media-player-media-session-api
remove-old-playlist
remove_old_collection_object
plugin-implementation-demo
feed_migration
refactor-feeds-from-item
fix_remove_authors_no_books
v2.17.3-fk-constraints-migration
migrations-first-upgrade
sqlite_2
feature/nuxt-target-server
waveform
sqlite
playlists
video
v2.35.1
v2.35.0
v2.34.0
v2.33.2
v2.33.1
v2.33.0
v2.32.1
v2.32.0
v2.31.0
v2.30.0
v2.29.0
v2.28.0
v2.27.0
v2.26.3
v2.26.2
v2.26.1
v2.26.0
v2.25.1
v2.25.0
v2.24.0
v2.23.0
v2.22.0
v2.21.0
v2.20.0
v2.19.5
v2.19.4
v2.19.3
v2.19.2
v2.19.1
v2.19.0
v2.18.1
v2.18.0
v2.17.7
v2.17.6
v2.17.5
v2.17.4
v2.17.3
v2.17.2
v2.17.1
v2.17.0
v2.16.2
v2.16.1
v2.16.0
v2.15.1
v2.15.0
v2.14.0
v2.13.4
v2.13.3
v2.13.2
v2.13.1
v2.13.0
v2.12.3
v2.12.2
v2.12.1
v2.12.0
v2.11.0
v2.10.1
v2.10.0
v2.9.0
v2.8.1
v2.8.0
v2.7.2
v2.7.1
v2.7.0
v2.6.0
v2.5.0
v2.4.4
v2.4.3
v2.4.2
v2.4.1
v2.4.0
v2.3.5
v2.3.4
v2.3.3
v2.3.2
v2.3.1
v2.3.0
v2.2.23
v2.2.22
v2.2.21
v2.2.20
v2.2.19
v2.2.18
v2.2.17
v2.2.16
v2.2.15
v2.2.14
v2.2.13
v2.2.12
v2.2.11
v2.2.10
v2.2.9
v2.2.8
v2.2.7
v2.2.6
v2.2.5
v2.2.4
v2.2.3
v2.2.2
v2.2.1
v2.2.0
v2.1.5
v2.1.4
v2.1.3
v2.1.2
v2.1.1
v2.1.0
v2.0.24
v2.0.23
v2.0.22
v2.0.21
v2.0.20
v2.0.19
v2.0.18
v2.0.17
v2.0.16
v2.0.15
v2.0.14
v2.0.13
v2.0.12
v2.0.11
v2.0.10
v2.0.9
v2.0.8
v2.0.7
v2.0.6
v2.0.5
v2.0.4
v2.0.3
v2.0.2
v2.0.1
v1.7.2
v1.7.1
v1.7.0
v1.6.0
v1.5.5
v1.5.0
v1.4.11
v1.4.9
v1.4.7
v1.4.6
v1.4.4
v1.4.2
v1.4.0
v1.4.1
v1.3.4
v1.3.3
v1.3.1
v1.2.8
v1.2.6
v1.2.5
v1.2.4
v1.2.1
v1.1.15
v1.1.14
v1.1.13
v1.1.12
v1.1.11
v1.1.10
v1.1.9
v1.1.8
v1.0.0
0.9.61-beta.0
0.9.61-beta
Labels
Clear labels
authentication
backlog
bug
chapter editor
config-issue
ebooks
encoding/embedding
enhancement
help wanted
listening sessions & progress
planned
possible plugin
progress sync
pull-request
sorting/filtering/searching
unable to reproduce
upload
users & permissions
waiting
Mirrored from GitHub Pull Request
No Label
bug
Milestone
No items
No Milestone
Projects
Clear projects
No project
Assignees
adam (Adam Melkus)
Clear assignees
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: starred/audiobookshelf#1834
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @BlackHoleFox on GitHub (Mar 23, 2024).
Describe the issue
Audiobookshelf seems to be generating, or somehow finding, duplicate episodes of multiple podcasts I have added and listen to. This is similar to https://github.com/advplyr/audiobookshelf/issues/2122 but doesn't create duplicate files on disk and happens in versions after that bugfix.
Weirdly this doesn't happen with every single podcast in my library, just a few. The ones I notice the most are:
Nothing jumps out in my logs either, though let me know if I should enable debug logs for a week or so to help.
I've deleted duplicates from the library multiple times at this point, but they just seem to be coming back. For example AMCA's library entry has every new episode duplicated right now:

One of each is not linked to an RSS episode:

And there are no duplicates on disk. They're purely database/UI located. They show up duplicated in both the web UI and Android app:

Steps to reproduce the issue
Schedule Automatic Episode Downloads. I have it running every day at midnight.Audiobookshelf version
2.8.0
How are you running audiobookshelf?
Docker
@advplyr commented on GitHub (Mar 23, 2024):
Can you disable the watcher in server settings as a test to see if you get duplicates on any episodes downloaded after that?
@BlackHoleFox commented on GitHub (Mar 23, 2024):
Turned it off and restarted audiobookshelf's container. I'll cleanup the duplicates in the two library items I mentioned and see if they come back.
@BlackHoleFox commented on GitHub (Mar 24, 2024):
Update: It seems to be the library scanner causing these? It runs periodically which would explain why they show up at a delay. It autoran in the last hour or two and the duplicates showed up. I cleaned out the duplicates from both podcasts once more and manually started a scan from
Settings-->Libraries-->Scan. The scan finished and told me some items updated:And then they showed up again in the podcast's library page. All the duplicated ones also showed up in the

Newest Episodes/recently added row. The actual newest episode is one to the right off screen:I can reliably reproduce this now with the steps above.
@advplyr commented on GitHub (Mar 24, 2024):
Can you share some about how you are mapping volumes in docker and what file system/OS you are using?
@BlackHoleFox commented on GitHub (Mar 31, 2024):
Yeah, here you are:
ext4but see below for the rest.My audiobookshelf data is stored between two places: The metadata and config for it is stored on the host but the podcasts themselves are stored elsewhere (for size and reliability reasons) and then mounted via SMB on the host running the container. From there, the podcasts folder is bind mounted into Docker so audiobookshelf can see it. Here's the relevant part of my Dockercompose config:
@BlackHoleFox commented on GitHub (Apr 28, 2024):
@advplyr Hiyo I'm back with with more info after more digging around. I started under the hypothesis this might have been something specific to my SMB share setup but after reviewing all the audiobookshelf logs with debug logging turned on, I saw nothing pointing at the storage location causing the weird broken episode duplicates.
Starting this off, I double/extra/etc checked there aren't duplicates in the RSS feed:

And now: Here's the debug and scanner logs when initiating a scan after removing 1 of the duplicate podcast episodes from the UI. Its a little odd that it says updated here but the scanner log says "new podcast episode".
And then the relevant part of the scanlog. The rest of it is just saying every other podcast is up to date:
When I remove one of these broken episodes, nothing indicates a problem:
Looking at the all episode list, the normal and broken episode appear (

stealth strike). The second goes away temporarily when I delete it from the UI. But it always returns after a library scan.So with not much progress on the surface the database got cloned to my computer and opened as well :) Despite deleting broken episodes in the UI, they still had a database entry, which looks very wrong compared to the correct one:

There's a bunch of "haunted" items like this in my library throughout multiple podcasts, and those all have broken duplicate database entries:

The interesting/important thing is that when I deleted the broken record from the
podcastEpisodestable, pushed the modified database to my server, and started the docker container it properly scrubbed the haunted episode away. When I scan the library, everything reports up-to-date and the duplicate doesn't reappear. Checking the RSS feed for new episodes doesn't find any junk either.Pretty puzzled how they got there, honestly. Nothing weird or dangerous has been done to the database or filesystem. Maybe its the result of broken RSS feeds churning and leaving remnants in the database if the creators re-uploaded etc?
So that was a lot, but I'm interested in what you think of the broken database rows as someone who actually knows what everything should look like. Maybe a possible fix could be improving the scanner to look for any rows with a
NULLindex (and maybe noenclosureSize+enclosureTypeas well) and deleting them as a cleanup task? I could also manually scrub the database once but honestly these might eventually trickle back in, so an automatic purger could be better?@advplyr commented on GitHub (Apr 28, 2024):
The enclosure fields are only populated if the episode was downloaded from an RSS feed. Not all podcast episodes need to come from an RSS feed, you may have audio files in your file system that are a podcast but don't have an associated RSS feed. So we wouldn't want to remove them automatically from the db like you are suggesting.
That is what the duplicates you are seeing are. They are episodes that were scanned in from the file system and were not matched with an existing episode while being scanned in so a new episode was created.
Still more information is needed to figure out what is going on. If you can enable Debug logs in the Logs page of settings this will provide more information during the scan.
Then re-create the issue and we should see where both episodes are being created, one will be from the scanner while the other will be getting created during the RSS feed download.
Also if you can be on the latest version v2.9.0
@BlackHoleFox commented on GitHub (Apr 29, 2024):
Got it 👍. Does the same train of thought apply to the
NULLindex field observation as well?Sounds pretty strange, honestly. I don't manually edit the filesystem Audiobookshelf is working with and get everything from RSS. Could filesystem metadata changes (
mtime, ACLs) etc break this (and only for a specific set of episodes)? There also have never been duplicate files for these episodes on disk. It points to the same exact MP3 file the real episode does and is why I can't check the "hard delete" option for these ghost episodes in the UI or it will delete the source file leaving the true episode broken.All the logs I shared above were with
Debuglogging enabled :rip: There wasn't any more detail available from the scan which "brought back" a deleted ghost episode.Yeah I will try, but I still don't know what actually causes these to be created. They have just randomly appeared so far.
I updated a few days ago, so all the logs above come from 2.9.0 :) Appreciate the help so far though.
@BlackHoleFox commented on GitHub (Jul 14, 2024):
Tis back with more logs, if they are helpful @advplyr. A new episode of A More Civilized Age came out a few days ago and got duplicated in my library. Nothing about the storage backend for the episodes has changed at all and I haven't modified any files manually.
I am currently running v2.11.0. If you want me to email you a zip file of the last week of logs.
From the logs of the day it downloaded:
Following days when re-scans occurred:
@advplyr commented on GitHub (Jul 14, 2024):
I can't think of any reasons for it
@advplyr commented on GitHub (Aug 16, 2024):
I came across the duplicate podcast episode but after reading through your issue again I don't think it is the same. It is a different bug where the scanner can scan in the audio file while it is being downloaded causing a duplicate episode with bad data like a different duration.
I'm not sure if I mentioned this before elsewhere but Abs uses the inode value of the files to see if they already exist. If you are using CIFS there is a setting that needs to be enabled called
serverinoin order for it to work properly with Abs.See this comment and the thread https://github.com/advplyr/audiobookshelf/issues/2509#issuecomment-1891047531
@BlackHoleFox commented on GitHub (Aug 16, 2024):
Heyo, very glad to hear you were able to at least reproduce something like it even if we aren't sure its the same. Funnily enough I think we might be on the path though? My current hypothesis is that a race condition is occurring when downloading too :)
Shortly after I made that logging PR (which, thanks for improving on) I started investigating another lead based on the log timestamps. After seeing a bunch of independent component's timestamps close together I looked again at the database and saw that the corrupt episode and the real one were updated within milliseconds of eachother (the one with the GUID is the right one):
createdAtis also close:The reason for being highly suspicious of this is because I changed the default scanning time for some of my podcasts ages ago to run every day at midnight. Here's the one for A More Civilized Age, the podcast I've been using in this thread as my example. Along with it is scheduled time for my automatic library scan, which is identical. The scheduled times also match when the episode gets downloaded and duplicated based on the database timestamps:
I created a test library, also stored on the same CIFS/SMB mount, 5 days ago and gave it identical settings except for making my sample podcast automatically check for downloads every hour instead so that it would download long before midnight and the automatic scan's scheduled time. I was going to post an update here with my results once another episode came out as I wanted to try and test it ""organically"" like it would occur in my main library. But since you seem to be thinking in the same direction as me I'm dumping my notes earlier to maybe help out and save you unneeded work.
Thanks for the reference. Read through the thread and double checked my CIFS/SMB server mount for the podcasts. afaict its had
serverinoenabled since day 1 with Audiobookshelf:Figured it couldn't hurt to look at the recorded

inode's of the latest corrupted episode, and they and the filenames match, whatever that's worth:@BlackHoleFox commented on GitHub (Aug 26, 2024):
I return, again, hello. I've concluded the test that I mentioned I was going to experiment with a few weeks ago. My test library seems to now support the theory I wrote out before about a race condition. A new episode of my sample podcast came out a few days ago and so I went to go check on how both my libraries handled it. The results came out as hypothesized :)
You can see here below that the real library ended up with yet another duplicate for this episode. But the test library (which is also being stored on my NAS via SMB) did not duplicate anything at all:
Again, the only difference between these two is the time they check for episodes daily in the podcast's configuration. This seems like decently strong evidence supporting the theory of a race condition, and maybe is something that you could try and configure yourself locally to reproduce with 100% certainty.
I also made sure some other podcasts had their automatic download timer staggered compared to the library-wide scan and I have not seen any duplicates arrive from them either.
@advplyr commented on GitHub (Nov 8, 2024):
I was able to fix the issue where the scanner runs while an episode is being downloaded causing a duplicate
@BlackHoleFox commented on GitHub (Nov 10, 2024):
Hurray, thanks 🎉. Will give it a spin once a new release goes out.
@github-actions[bot] commented on GitHub (Nov 17, 2024):
Fixed in v2.17.0.