Does not limit retries to letsencrypt APIs #378

New Issue

adam · 2025-12-29T01:23:58+01:00

adam commented

2025-12-29 01:23:58 +01:00

Originally created by @patrakov on GitHub (Sep 19, 2018).

Here is an endless loop that retries the checks for pending challenges

https://github.com/lukas2511/dehydrated/blob/master/dehydrated#L803

Today I got it runnung for at least 20 minutes, and the validation is still pending. So, I pressed Ctrl+C.

I think it is a temporary problem on the letsencrypt side, but still, infinite retries mean that there is a risk to get banned.

Originally created by @patrakov on GitHub (Sep 19, 2018). Here is an endless loop that retries the checks for pending challenges https://github.com/lukas2511/dehydrated/blob/master/dehydrated#L803 Today I got it runnung for at least 20 minutes, and the validation is still pending. So, I pressed Ctrl+C. I think it is a temporary problem on the letsencrypt side, but still, infinite retries mean that there is a risk to get banned.

adam closed this issue

2025-12-29 01:23:59 +01:00

adam commented

2025-12-29 01:23:59 +01:00

@cpu commented on GitHub (Sep 19, 2018):

Hi @patrakov

I think it is a temporary problem on the letsencrypt side

Can you share the authorization ID that you were stuck polling? I suspect you may have encountered this Let's Encrypt bug which has been fixed in Boulder master but not yet deployed (should be ~Thursday). See my follow-up reply. This was not the case here.

I agree that its a good idea for Dehydrated to limit retries regardless of the bug fix :-)

@cpu commented on GitHub (Sep 19, 2018): Hi @patrakov > I think it is a temporary problem on the letsencrypt side <s>Can you share the authorization ID that you were stuck polling? I suspect you may have encountered [this Let's Encrypt bug](https://github.com/letsencrypt/boulder/issues/3833) which has been [fixed in Boulder master](https://github.com/letsencrypt/boulder/pull/3844) but not yet deployed (should be ~Thursday).</s> See my follow-up reply. This was not the case here. I agree that its a good idea for Dehydrated to limit retries regardless of the bug fix :-)

adam commented

2025-12-29 01:24:00 +01:00

@patrakov commented on GitHub (Sep 19, 2018):

I am afraid of posting anything from the "bash -x dehydrated -c -x" log here, because I do not know what is public and what is private.

@patrakov commented on GitHub (Sep 19, 2018): I am afraid of posting anything from the "bash -x dehydrated -c -x" log here, because I do not know what is public and what is private.

adam commented

2025-12-29 01:24:00 +01:00

@cpu commented on GitHub (Sep 19, 2018):

@patrakov If you'd prefer you can share the log in email to cpu at letsencrypt.org.

@cpu commented on GitHub (Sep 19, 2018): @patrakov If you'd prefer you can share the log in email to `cpu` at `letsencrypt.org`.

adam commented

2025-12-29 01:24:00 +01:00

@cpu commented on GitHub (Sep 19, 2018):

I suspect you may have encountered this Let's Encrypt bug which has been fixed in Boulder master but not yet deployed (should be ~Thursday).

With the additional log data provided by @patrakov I was able to confirm this bug was not the reason for this infinite loop.

Instead what I believe happened is as follows (using example data):

On 19/09/2018 ~7:58 UTC the ACME client created an order for {example.com,www.example.com}. Two pending authorizations were created. The client POSTed the HTTP-01 challenge of both, solving the challenges, and getting two valid authorizations: one for example.com, one for www.example.com. The order was finalized and a certificate was issued.

On 19/09/2018 ~8:24 UTC the ACME client created another order for {example.com, www.example.com}. Since the previous order was valid and not pending a new order was created. Since the client already had valid authorizations for the two domain names they were reused, and an order was returned in "ready" status with two valid authorizations. The client ignored this and POSTed the DNS-01 challenge of the already valid authorization. This produced an error because the authorization was not pending. The client ignored this error and began polling the DNS-01 challenge for it to transition out of status pending. This will never happen because the overall authorization is already valid by way of the HTTP-01 challenge and so it entered into a loop.

We've seen this class of bug before and it usually results from the ACME client ignoring the order and authorization state (both valid) and becoming fixated on a specific challenge's state (in this case DNS-01). The log data doesn't match up with the race condition bug in Boulder that will soon be fixed. Apologies for adding that noise! I hope the server side log analysis of helps @lukas2511 understand the root issue.

Thanks for providing the data @patrakov !

@cpu commented on GitHub (Sep 19, 2018): > I suspect you may have encountered this Let's Encrypt bug which has been fixed in Boulder master but not yet deployed (should be ~Thursday). With the additional log data provided by @patrakov I was able to confirm this bug was **not** the reason for this infinite loop. Instead what I believe happened is as follows (using example data): On 19/09/2018 ~7:58 UTC the ACME client created an order for {example.com,www.example.com}. Two pending authorizations were created. The client POSTed the HTTP-01 challenge of both, solving the challenges, and getting two valid authorizations: one for `example.com`, one for `www.example.com`. The order was finalized and a certificate was issued. On 19/09/2018 ~8:24 UTC the ACME client created another order for {example.com, www.example.com}. Since the previous order was valid and not pending a new order was created. Since the client already had valid authorizations for the two domain names they were reused, and an order was returned in "ready" status with two valid authorizations. The client ignored this and POSTed the DNS-01 challenge of the already valid authorization. This produced an error because the authorization was not pending. The client ignored this error and began polling the DNS-01 challenge for it to transition out of status pending. This will never happen because the overall authorization is already valid by way of the HTTP-01 challenge and so it entered into a loop. We've seen this class of bug before and it usually results from the ACME client ignoring the order and authorization state (both valid) and becoming fixated on a specific challenge's state (in this case DNS-01). The log data doesn't match up with the race condition bug in Boulder that will soon be fixed. Apologies for adding that noise! I hope the server side log analysis of helps @lukas2511 understand the root issue. Thanks for providing the data @patrakov !

adam commented

2025-12-29 01:24:01 +01:00

@patrakov commented on GitHub (Sep 20, 2018):

I confirm that it was a forced renewal after changing the challenge type from http-01 to dns-01.

@patrakov commented on GitHub (Sep 20, 2018): I confirm that it was a forced renewal after changing the challenge type from http-01 to dns-01.

adam commented

2025-12-29 01:24:01 +01:00

@lukas2511 commented on GitHub (Dec 10, 2020):

This should be fixed by now as it appears to have been a side-effect of bad JSON parsing.

@lukas2511 commented on GitHub (Dec 10, 2020): This should be fixed by now as it appears to have been a side-effect of bad JSON parsing.

adam commented

2025-12-29 01:24:01 +01:00

@patrakov commented on GitHub (Dec 10, 2020):

Well, this particular case was due to bad JSON parsing. However, the issue is about something completely different: retries should be limited no matter what other bugs exist in the code. I.e., this loop condition is bad:

while || ; do

There should be an additional hard limit for the number of retries.

@patrakov commented on GitHub (Dec 10, 2020): Well, this particular case was due to bad JSON parsing. However, the issue is about something completely different: retries should be limited no matter what other bugs exist in the code. I.e., this loop condition is bad: while [[ "${reqstatus}" = "pending" ]] || [[ "${reqstatus}" = "processing" ]]; do There should be an additional hard limit for the number of retries.

adam referenced this issue

2025-12-29 01:29:27 +01:00

[PR #381] [CLOSED] Error logging #817

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/dehydrated#378