mirror of
https://github.com/dehydrated-io/dehydrated.git
synced 2026-01-13 15:13:33 +01:00
Does not limit retries to letsencrypt APIs #378
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @patrakov on GitHub (Sep 19, 2018).
Here is an endless loop that retries the checks for pending challenges
https://github.com/lukas2511/dehydrated/blob/master/dehydrated#L803
Today I got it runnung for at least 20 minutes, and the validation is still pending. So, I pressed Ctrl+C.
I think it is a temporary problem on the letsencrypt side, but still, infinite retries mean that there is a risk to get banned.
@cpu commented on GitHub (Sep 19, 2018):
Hi @patrakov
Can you share the authorization ID that you were stuck polling? I suspect you may have encountered this Let's Encrypt bug which has been fixed in Boulder master but not yet deployed (should be ~Thursday).See my follow-up reply. This was not the case here.I agree that its a good idea for Dehydrated to limit retries regardless of the bug fix :-)
@patrakov commented on GitHub (Sep 19, 2018):
I am afraid of posting anything from the "bash -x dehydrated -c -x" log here, because I do not know what is public and what is private.
@cpu commented on GitHub (Sep 19, 2018):
@patrakov If you'd prefer you can share the log in email to
cpuatletsencrypt.org.@cpu commented on GitHub (Sep 19, 2018):
With the additional log data provided by @patrakov I was able to confirm this bug was not the reason for this infinite loop.
Instead what I believe happened is as follows (using example data):
On 19/09/2018 ~7:58 UTC the ACME client created an order for {example.com,www.example.com}. Two pending authorizations were created. The client POSTed the HTTP-01 challenge of both, solving the challenges, and getting two valid authorizations: one for
example.com, one forwww.example.com. The order was finalized and a certificate was issued.On 19/09/2018 ~8:24 UTC the ACME client created another order for {example.com, www.example.com}. Since the previous order was valid and not pending a new order was created. Since the client already had valid authorizations for the two domain names they were reused, and an order was returned in "ready" status with two valid authorizations. The client ignored this and POSTed the DNS-01 challenge of the already valid authorization. This produced an error because the authorization was not pending. The client ignored this error and began polling the DNS-01 challenge for it to transition out of status pending. This will never happen because the overall authorization is already valid by way of the HTTP-01 challenge and so it entered into a loop.
We've seen this class of bug before and it usually results from the ACME client ignoring the order and authorization state (both valid) and becoming fixated on a specific challenge's state (in this case DNS-01). The log data doesn't match up with the race condition bug in Boulder that will soon be fixed. Apologies for adding that noise! I hope the server side log analysis of helps @lukas2511 understand the root issue.
Thanks for providing the data @patrakov !
@patrakov commented on GitHub (Sep 20, 2018):
I confirm that it was a forced renewal after changing the challenge type from http-01 to dns-01.
@lukas2511 commented on GitHub (Dec 10, 2020):
This should be fixed by now as it appears to have been a side-effect of bad JSON parsing.
@patrakov commented on GitHub (Dec 10, 2020):
Well, this particular case was due to bad JSON parsing. However, the issue is about something completely different: retries should be limited no matter what other bugs exist in the code. I.e., this loop condition is bad:
while || ; do
There should be an additional hard limit for the number of retries.