Dehydrated does not keep track of deploy hook failures #209

New Issue

adam · 2025-12-29T01:18:58+01:00

adam commented

2025-12-29 01:18:58 +01:00

Originally created by @pv2b on GitHub (Mar 30, 2017).

In case a deploy hook fails, dehydrated does not make a note of this to try to redeploy the certificate on the next cron run.

Here's some sample output of this:

$ dehydrated -c
# INFO: Using main config file /etc/dehydrated/config
Processing certtest.itassistans.net
 + Signing domains...
 + Generating private key...
 + Generating signing request...
 + Requesting challenge for certtest.itassistans.net...
 + Responding to challenge for certtest.itassistans.net...
 + Challenge is valid!
 + Requesting certificate...
 + Checking certificate...
 + Done!
 + Creating fullchain.pem...
writing RSA key
Traceback (most recent call last):
  File "/usr/local/bin/deploy_nsx_sslvpn_certificate", line 127, in <module>
    for line in file('/etc/dehydated/nsx-edges'):
IOError: [Errno 2] No such file or directory: '/etc/dehydated/nsx-edges'
$ vi /usr/local/bin/deploy_nsx_sslvpn_certificate
$ dehydrated -c
# INFO: Using main config file /etc/dehydrated/config
Processing certtest.itassistans.net
 + Checking domain name(s) of existing cert... unchanged.
 + Checking expire date of existing cert...
 + Valid till Jun 28 08:25:00 2017 GMT (Longer than 30 days). Skipping renew!

In this case the problem was a typo in my script :-) But it might just as well have been some kind of temporary retryable error causing the deploy to fail.

In order to fix this problem, dehydrated needs to keep a note of failed hook executions and re-try them on next cron run.

Originally created by @pv2b on GitHub (Mar 30, 2017). In case a deploy hook fails, dehydrated does not make a note of this to try to redeploy the certificate on the next cron run. Here's some sample output of this: ``` $ dehydrated -c # INFO: Using main config file /etc/dehydrated/config Processing certtest.itassistans.net + Signing domains... + Generating private key... + Generating signing request... + Requesting challenge for certtest.itassistans.net... + Responding to challenge for certtest.itassistans.net... + Challenge is valid! + Requesting certificate... + Checking certificate... + Done! + Creating fullchain.pem... writing RSA key Traceback (most recent call last): File "/usr/local/bin/deploy_nsx_sslvpn_certificate", line 127, in <module> for line in file('/etc/dehydated/nsx-edges'): IOError: [Errno 2] No such file or directory: '/etc/dehydated/nsx-edges' $ vi /usr/local/bin/deploy_nsx_sslvpn_certificate $ dehydrated -c # INFO: Using main config file /etc/dehydrated/config Processing certtest.itassistans.net + Checking domain name(s) of existing cert... unchanged. + Checking expire date of existing cert... + Valid till Jun 28 08:25:00 2017 GMT (Longer than 30 days). Skipping renew! ``` In this case the problem was a typo in my script :-) But it might just as well have been some kind of temporary retryable error causing the deploy to fail. In order to fix this problem, dehydrated needs to keep a note of failed hook executions and re-try them on next cron run.

adam closed this issue

2025-12-29 01:18:58 +01:00

adam commented

2025-12-29 01:18:59 +01:00

@txr13 commented on GitHub (Mar 30, 2017):

I feel that this assumes too much. For one, how often is the script executed through cron--and is it executed through cron at all? Some people run it through cron every day, some every week, others every two weeks... there's no guarantee that the "next cron run" will happen in time to deploy the cert before the previous cert expires.

The script can also be run (with hook) completely manually. In this case, waiting for the "next cron run" may never happen--and if the script is executed manually later, it's very questionable that the user would want the script to suddenly start redeploying scripts from the last run, especially if they were now using a different hook entirely. (And if you want a hook to incorporate this feature in a self-contained manner, you are always free to write one to fit your needs!)

Finally, the failed deploy may be already caught through manual review of cron logs, or by inspection of the installed certificate, and fixed through manual intervention. It would be pointless at best to redeploy in this situation.

For all of these reasons, I feel it is better not to automatically retry failed hook calls. Any such functionality would be better handled from within the hook itself, rather than assuming anything about how the main script will be run.

@txr13 commented on GitHub (Mar 30, 2017): I feel that this assumes too much. For one, how often is the script executed through cron--and is it executed through cron at all? Some people run it through cron every day, some every week, others every two weeks... there's no guarantee that the "next cron run" will happen in time to deploy the cert before the previous cert expires. The script can also be run (with hook) completely manually. In this case, waiting for the "next cron run" may never happen--and if the script is executed manually later, it's very questionable that the user would want the script to suddenly start redeploying scripts from the last run, especially if they were now using a different hook entirely. (And if you want a hook to incorporate this feature in a self-contained manner, you are always free to write one to fit your needs!) Finally, the failed deploy may be already caught through manual review of cron logs, or by inspection of the installed certificate, and fixed through manual intervention. It would be pointless at best to redeploy in this situation. For all of these reasons, I feel it is better not to automatically retry failed hook calls. Any such functionality would be better handled from within the hook itself, rather than assuming anything about how the main script will be run.

adam commented

2025-12-29 01:18:59 +01:00

@pv2b commented on GitHub (Mar 30, 2017):

I hear what you're saying but I have to disagree, because leaving Dehydrated as it is would mean inconsistent behavior.

If I run Dehydrated, and for some reason, it isn't able to obtain a certificate, re-running Dehydrated will attempt to obtain the certificate again if re-run. The cause of the failure is immaterial - it could be anything from the fact that the cron job didn't run due to a power failure, or a network problem causing Dehydrated to be unable to reach the LetsEncrypt servers, or an outage at LetsEncrypt. The only thing that matters is that LetsEncrypt does not have an up-to-date certificate in the certificate store.

The fact that running Dehydrated will re-try failed operations is therefore not surprising to an administrator. The fact that deploys are not retried on subsequent runs of Dehydrated (no matter if run from cron, manually, or in some other way) is surprising in that context. So, as per the Rule of Least Surprise, implementing retry of certificate deploy is a good idea.

Running Dehydrated on a regular basis is still what you should do anyway in order to mitigate errors on the issuance side. Personally I'd suggest an interval between 1-7 days, to permit enough retries, but that's neither here nor there.

With all this considered, I can understand the point of view that introducing a feature for retries in Dehydrated may be an undesired increase in complexity. Perhaps, in view of that, the fact that Dehydrated does have a certificate deploy hook at all is the design mistake? Perhaps, as per the concept of Worse Is Better, removing that would be the right way forward. :-)

@pv2b commented on GitHub (Mar 30, 2017): I hear what you're saying but I have to disagree, because leaving Dehydrated as it is would mean inconsistent behavior. If I run Dehydrated, and for some reason, it isn't able to obtain a certificate, re-running Dehydrated will attempt to obtain the certificate again if re-run. The cause of the failure is immaterial - it could be anything from the fact that the cron job didn't run due to a power failure, or a network problem causing Dehydrated to be unable to reach the LetsEncrypt servers, or an outage at LetsEncrypt. The only thing that matters is that LetsEncrypt does not have an up-to-date certificate in the certificate store. The fact that running Dehydrated will re-try failed operations is therefore not surprising to an administrator. The fact that deploys are not retried on subsequent runs of Dehydrated (no matter if run from cron, manually, or in some other way) is surprising in that context. So, as per the Rule of Least Surprise, implementing retry of certificate deploy is a good idea. Running Dehydrated on a regular basis is still what you should do anyway in order to mitigate errors on the issuance side. Personally I'd suggest an interval between 1-7 days, to permit enough retries, but that's neither here nor there. With all this considered, I can understand the point of view that introducing a feature for retries in Dehydrated may be an undesired increase in complexity. Perhaps, in view of that, the fact that Dehydrated does have a certificate deploy hook at all is the design mistake? Perhaps, as per the concept of Worse Is Better, removing that would be the right way forward. :-)

adam commented

2025-12-29 01:18:59 +01:00

@txr13 commented on GitHub (Mar 30, 2017):

There's a fundamental difference between retrying failed renews and retrying failed deploys.

A failed renew is easily detected by seeing that the expiry date hasn't been updated. The script naturally sees this and attempts another renewal. There's no additional logic needed; this is the same process as any renewal, whether it failed previously or not.

This script is not built to do everything. It's built specifically to handle the renewals and the process around that. Hooks are made available to extend the main script's functionality; wrappers can be written to add even more. But the functionality of the hooks and/or wrappers are not dehydrated's concern. All dehydrated has to do is call the hooks.

I do not agree that retrying failed renews is inconsistent with not retrying failed deploys. One activity is dehydrated's main function; the other is an optional feature enabled by other scripts not under dehydrated's purview. Let those other scripts do their own error detection and recovery, if they wish.

I also disagree with the suggestion to simply remove the hook altogether. It's an optional feature offered to allow others a way to write in additional functionality. There are those who use that hook, and successfully. I'm not persuaded that removing an optional way to extend functionality solves any issues yet raised.

@txr13 commented on GitHub (Mar 30, 2017): There's a fundamental difference between retrying failed renews and retrying failed deploys. A failed renew is easily detected by seeing that the expiry date hasn't been updated. The script naturally sees this and attempts another renewal. There's no additional logic needed; this is the same process as any renewal, whether it failed previously or not. This script is not built to do everything. It's built specifically to handle the renewals and the process around that. Hooks are made available to extend the main script's functionality; wrappers can be written to add even more. But the functionality of the hooks and/or wrappers are not dehydrated's concern. All dehydrated has to do is call the hooks. I do not agree that retrying failed renews is inconsistent with not retrying failed deploys. One activity is dehydrated's main function; the other is an optional feature enabled by other scripts not under dehydrated's purview. Let those other scripts do their own error detection and recovery, if they wish. I also disagree with the suggestion to simply remove the hook altogether. It's an optional feature offered to allow others a way to write in additional functionality. There are those who use that hook, and successfully. I'm not persuaded that removing an optional way to extend functionality solves any issues yet raised.

adam commented

2025-12-29 01:18:59 +01:00

@lukas2511 commented on GitHub (Mar 30, 2017):

I agree with what @txr13 said, here is an idea on how to solve this yourself:

Instead of doing all your operations directly in the deploy hook write the parameters into a queue, then on the exit hook or even on the unchanged hook run a script that checks if there is anything in the queue, does whatever it needs to do, and removes it from the queue.
If it fails it will just try again on the next hook execution.

There are a lot of other options for you to implement this easily for yourself, and there might even be cases where it would be bad if a deployment would run twice (or unexpected), so I'm not going to implement this inside of dehydrated.

@lukas2511 commented on GitHub (Mar 30, 2017): I agree with what @txr13 said, here is an idea on how to solve this yourself: Instead of doing all your operations directly in the deploy hook write the parameters into a queue, then on the exit hook or even on the unchanged hook run a script that checks if there is anything in the queue, does whatever it needs to do, and removes it from the queue. If it fails it will just try again on the next hook execution. There are a lot of other options for you to implement this easily for yourself, and there might even be cases where it would be bad if a deployment would run twice (or unexpected), so I'm not going to implement this inside of dehydrated.

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/dehydrated#209