[PR #463] [MERGED] Update Audible scraper to be more strict about what it considers an ASIN and a valid ASIN query response #3357

Closed
opened 2026-04-25 00:15:20 +02:00 by adam · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/advplyr/audiobookshelf/pull/463
Author: @selfhost-alt
Created: 4/16/2022
Status: Merged
Merged: 4/18/2022
Merged by: @advplyr

Base: masterHead: strict-asin-check


📝 Commits (3)

  • d6c5b6e Implement a stricter check for possible ASIN values in titles
  • cdcfd01 Only consider an Audible ASIN query successful if the response contains an author
  • 2fc60e4 Handle an undefined publisher_summary when querying Audible

📊 Changes

1 file changed (+3 additions, -3 deletions)

View changed files

📝 server/providers/Audible.js (+3 -3)

📄 Description

When doing a scrape of my library, I noticed the following exception in the logs.

[2022-04-16T17:33:21.380Z] DEBUG: [Audible] ASIN url: https://api.audible.com/1.0/catalog/products/GOLDEN%20RENDEZVOUS?response_groups=rating%2Cseries%2Ccontributors%2Cproduct_desc%2Cmedia%2Cproduct_extended_attrs&image_sizes=500%2C1024%2C2000
(node:25) UnhandledPromiseRejectionWarning: TypeError: string-strip-html/stripHtml(): [THROW_ID_01] Input must be string! Currently it's: undefined, equal to:
undefined
    at stripHtml (/node_modules/string-strip-html/dist/string-strip-html.cjs.js:228:11)
    at Audible.cleanResult (/server/providers/Audible.js:20:26)
    at /server/providers/Audible.js:64:65
    at Array.map (<anonymous>)
    at Audible.search (/server/providers/Audible.js:64:48)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at async BookFinder.getAudibleResults (/server/BookFinder.js:162:17)
    at async Scanner.quickMatchBook (/server/scanner/Scanner.js:632:19)
    at async Scanner.matchLibraryBooks (/server/scanner/Scanner.js:701:20)
(node:25) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 31)

This appears to be due to the fact that the Audible scraper found a sequence of 10 capital letters in the book title "GOLDEN RENDEZVOUS" and assumed that meant the title was an ASIN.

Then the secondary problem is that Audible's API apparently returns a 200 even when it gets an invalid ASIN. The response looks like this:

{"product":{"asin":"GOLDEN RENDEZVOUS","rating":{"num_reviews":0,"overall_distribution":{"average_rating":0.0,"display_average_rating":"0.0","display_stars":0.0,"num_five_star_ratings":0,"num_four_star_ratings":0,"num_one_star_ratings":0,"num_ratings":0,"num_three_star_ratings":0,"num_two_star_ratings":0},"performance_distribution":{"average_rating":0.0,"display_average_rating":"0.0","display_stars":0.0,"num_five_star_ratings":0,"num_four_star_ratings":0,"num_one_star_ratings":0,"num_ratings":0,"num_three_star_ratings":0,"num_two_star_ratings":0},"story_distribution":{"average_rating":0.0,"display_average_rating":"0.0","display_stars":0.0,"num_five_star_ratings":0,"num_four_star_ratings":0,"num_one_star_ratings":0,"num_ratings":0,"num_three_star_ratings":0,"num_two_star_ratings":0}}},"response_groups":["product_desc","always-returned","product_extended_attrs","contributors","series","rating","media"]}

So the Audible scraper tries to parse the response and chokes because most of the fields are missing (specifically it fails when trying to remove HTML tags from the publisher_summary.

This diff addresses these issues with two changes. First it makes the ASIN matching regex more strict so it will only match a title that is exactly 10 capitalized alpha-numeric characters. Second, it only considers the ASIN query response to be successful if it returns an authors field. That seemed like the most likely field to always be present in a valid response, but I'm happy to update that to something else if preferred.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/advplyr/audiobookshelf/pull/463 **Author:** [@selfhost-alt](https://github.com/selfhost-alt) **Created:** 4/16/2022 **Status:** ✅ Merged **Merged:** 4/18/2022 **Merged by:** [@advplyr](https://github.com/advplyr) **Base:** `master` ← **Head:** `strict-asin-check` --- ### 📝 Commits (3) - [`d6c5b6e`](https://github.com/advplyr/audiobookshelf/commit/d6c5b6e8c65191a49c62fb15bcf7431895a8f693) Implement a stricter check for possible ASIN values in titles - [`cdcfd01`](https://github.com/advplyr/audiobookshelf/commit/cdcfd01da2cdf656c366b2690b14ff95d5df0a44) Only consider an Audible ASIN query successful if the response contains an author - [`2fc60e4`](https://github.com/advplyr/audiobookshelf/commit/2fc60e4e9ca00b4d983ec6b3a2240513dfcfc4e5) Handle an undefined publisher_summary when querying Audible ### 📊 Changes **1 file changed** (+3 additions, -3 deletions) <details> <summary>View changed files</summary> 📝 `server/providers/Audible.js` (+3 -3) </details> ### 📄 Description When doing a scrape of my library, I noticed the following exception in the logs. ``` [2022-04-16T17:33:21.380Z] DEBUG: [Audible] ASIN url: https://api.audible.com/1.0/catalog/products/GOLDEN%20RENDEZVOUS?response_groups=rating%2Cseries%2Ccontributors%2Cproduct_desc%2Cmedia%2Cproduct_extended_attrs&image_sizes=500%2C1024%2C2000 (node:25) UnhandledPromiseRejectionWarning: TypeError: string-strip-html/stripHtml(): [THROW_ID_01] Input must be string! Currently it's: undefined, equal to: undefined at stripHtml (/node_modules/string-strip-html/dist/string-strip-html.cjs.js:228:11) at Audible.cleanResult (/server/providers/Audible.js:20:26) at /server/providers/Audible.js:64:65 at Array.map (<anonymous>) at Audible.search (/server/providers/Audible.js:64:48) at runMicrotasks (<anonymous>) at processTicksAndRejections (internal/process/task_queues.js:97:5) at async BookFinder.getAudibleResults (/server/BookFinder.js:162:17) at async Scanner.quickMatchBook (/server/scanner/Scanner.js:632:19) at async Scanner.matchLibraryBooks (/server/scanner/Scanner.js:701:20) (node:25) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 31) ``` This appears to be due to the fact that the Audible scraper found a sequence of 10 capital letters in the book title "GOLDEN RENDEZVOUS" and assumed that meant the title was an ASIN. Then the secondary problem is that Audible's API apparently returns a 200 even when it gets an invalid ASIN. The response looks like this: ``` {"product":{"asin":"GOLDEN RENDEZVOUS","rating":{"num_reviews":0,"overall_distribution":{"average_rating":0.0,"display_average_rating":"0.0","display_stars":0.0,"num_five_star_ratings":0,"num_four_star_ratings":0,"num_one_star_ratings":0,"num_ratings":0,"num_three_star_ratings":0,"num_two_star_ratings":0},"performance_distribution":{"average_rating":0.0,"display_average_rating":"0.0","display_stars":0.0,"num_five_star_ratings":0,"num_four_star_ratings":0,"num_one_star_ratings":0,"num_ratings":0,"num_three_star_ratings":0,"num_two_star_ratings":0},"story_distribution":{"average_rating":0.0,"display_average_rating":"0.0","display_stars":0.0,"num_five_star_ratings":0,"num_four_star_ratings":0,"num_one_star_ratings":0,"num_ratings":0,"num_three_star_ratings":0,"num_two_star_ratings":0}}},"response_groups":["product_desc","always-returned","product_extended_attrs","contributors","series","rating","media"]} ``` So the Audible scraper tries to parse the response and chokes because most of the fields are missing (specifically it fails when trying to remove HTML tags from the `publisher_summary`. This diff addresses these issues with two changes. First it makes the ASIN matching regex more strict so it will only match a title that is *exactly* 10 capitalized alpha-numeric characters. Second, it only considers the ASIN query response to be successful if it returns an `authors` field. That seemed like the most likely field to *always* be present in a valid response, but I'm happy to update that to something else if preferred. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
adam added the pull-request label 2026-04-25 00:15:20 +02:00
adam closed this issue 2026-04-25 00:15:20 +02:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/audiobookshelf#3357