mirror of
https://github.com/dehydrated-io/dehydrated.git
synced 2026-01-11 22:30:44 +01:00
JWS has invalid anti-replay nonce #336
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @legsak1mbo on GitHub (Apr 25, 2018).
I'm getting this error with increasing frequency. The response looks something like:-
The only "fix" appears to be re-running dehydrated, sometimes several times, until it succeeds.
In https://github.com/diafygi/gethttpsforfree/issues/150#issuecomment-380361381 they suggest that "nonce timeouts are becoming more common". I assume that's what I'm seeing here too?
@lukas2511 commented on GitHub (Apr 25, 2018):
Mh that is weird, I never encountered that issue... A nonce timeout seems unlikely to me as dehydrated retrieves a nonce for every signed request and immediately uses it.
If you encounter this issue often you could help me identify the issue by adding an
echo "Nonce: ${nonce}" >&2after the whole code-block marked as# Retrieve nonce from acme-server.If it happens again compare the nonce to the previous working ones and see if it is somehow shorter or looks completely different, has a special character the other nonces don't have, whatever it could be.
@CliffS commented on GitHub (Apr 26, 2018):
I am also seeing this. Example output:
@CliffS commented on GitHub (Apr 26, 2018):
In case it helps, my config file contains:
@lukas2511 commented on GitHub (Apr 26, 2018):
This is a really hard issue to debug... I'm running dehydrated in a loop right now but I'm not able to get it to fail even once. I'll try to implement retries using the nonce send back by the server, but it's really really hard for me to test as I just can't get it to fail.
@legsak1mbo commented on GitHub (Apr 26, 2018):
OK, so it fails first time on every machine I'm trying it on.
Output (with echo) is below...
@lukas2511 commented on GitHub (Apr 26, 2018):
@CliffS @legsak1mbo do you by any chance have multiple egress ip addresses or a dual stack (ipv4+ipv6) setup? can you verify if the issue goes away if you set
IP_VERSION=6in your config file?@legsak1mbo commented on GitHub (Apr 26, 2018):
Not using multiple egress addresses or IPv6 here.
@lukas2511 commented on GitHub (Apr 26, 2018):
@legsak1mbo are you sure? no NAT or something that could result in the request coming from a different IP? that's basically the only way I'm able to reproduce this issue.
@legsak1mbo commented on GitHub (Apr 26, 2018):
I don't believe so. Certainly nothing that would change between requests in the same run.
@lukas2511 commented on GitHub (Apr 26, 2018):
@legsak1mbo can you do a few
curl https://my-ipv4.kurz.pwrequests and see if the result changes between runs? just to make sure.@legsak1mbo commented on GitHub (Apr 26, 2018):
Well heck, it certainly does!
Time to get on the phone to my ISP...
@lukas2511 commented on GitHub (Apr 26, 2018):
@legsak1mbo yea.. meh. in that case even retries wouldn't do you any good as it would be basically luck-based if the request goes through cleanly...
@CliffS commented on GitHub (Apr 26, 2018):
@lukas2511 Fixing it to IPv6 appears to have solved the problem for me.
@smortex commented on GitHub (May 4, 2018):
We encountered the same problem today. It appears that a customer has changed the DNS configuration of one of the domains of the certificate failing to renew to a previous configuration where the A record was the IP address of an old shared hosting by OVH and no AAAA record was set.
Because OVH can do TLS for shared hosting through letsencrypt, my guess is that when the letsencrypt validation server tries to fetch a token it gets one from OVH (maybe an old one, and of course it's not what the validation server expects so the renewal fail… and the "invalid anti-replay nonce" message makes sense).
@CliffS Maye it's worth double-checking that both IPv4 and IPv6 resolves to the same server for your domain: that would explain why renewing over IPv4 would work if accessing through IPv6 brings you somewhere else?
@CliffS commented on GitHub (May 4, 2018):
@smortex Interestingly there was no reverse DNS for the IPv6 address, though IPv4 reverse was correct. I have fixed the IPv6 reverse and I will retest without forcing the IPv6.
@smortex commented on GitHub (May 4, 2018):
@CliffS I don't think reverse DNS has an impact here. I was thinking about the IPv4 address and the IPv6 address not being served by the same machine.
Just like for example http://www.kame.net/ is not the same site over IPv4 and IPv6… Static image in one case, animated gif otherwise 😉
@AceSlash commented on GitHub (Jun 6, 2018):
I had this issue today on a certificate with 6 alternative names: it was failing randomly on one of them.
After talking about it on irc with @lukas2511 and reading this thread, setting
IP_VERSION=6did indeed fix the issue for me. The server in question has 2 IPv4 addresses and 1 IPv6 address, but never had the issue before.Checking with
curl https://my-ipv4.kurz.pw, I always see the same IPv4 address, so I don't think it flickers.I'll try to test that by creating a certificate with a lot of alternative name and run tcpdump to capture the result and see what exactly is going on.
@major commented on GitHub (Jun 12, 2018):
I was having this same problem today and found that setting
IP_VERSION=4fixed the issue. My laptop has an IPv4 and IPv6 address.@lukavia commented on GitHub (Jun 18, 2018):
I had this issue today, but unfortunately the ISP won't change its behavior.
@lukas2511 looking around the forums at LetsEncrypt there was a suggestion that the client retry the request with the nonce from the response a few(reasonable) times before giving up.
Would it be hard to implement this in dehydrated?
@lukas2511 commented on GitHub (Jun 30, 2018):
@lukavia i want to implement two things in the not-too-far future:
I want to try to find a way to resolve the api hostname only once, so that every further curl call uses the same server, this will solve this bug.
I also want to add retries, but those are a lower priority for me as use-cases with hundreds of domains per single certificate are low and everything else can quickly be solved by just running the script again.
@lukavia commented on GitHub (Jun 30, 2018):
@lukas2511 Unfortunately even if you resolve the api hostname only once, the problem will persist since the provider route would still be different every time. Here is an example:
I have 2 internet providers with pfSense router installed. pfSense is making load balance one on one. That means that on every request to the same ip each time it goes through the other provider and the originator (your) ip is different.
So I think retry feature should be higher priority.
P.S. Since you use curl, you can just make "host" command the first time to get the ip, and then use the ip with Host header for each request
@olivluca commented on GitHub (Jun 30, 2018):
@lukavia you can simply program a pfsense rule to route traffic to letsencrypt through one provider (i.e do not load balance it).
That's what I did when I I had the same issue.
@lukavia commented on GitHub (Jun 30, 2018):
This was an example of the problem. When the provider does the same you don't have access to those settings.
@bohwaz commented on GitHub (Jul 7, 2018):
I also have the same issue, it seems to happen randomly, I have to launch dehydrated in a loop until it succeeds…
@mcv21 commented on GitHub (Jul 11, 2018):
We're also seeing this, probably related to being behind NAT (so outgoing IP changes all the time). It's not clear to me why this should matter - is the source IP encoded in the Nonce somehow, or is it stateless at the server and and you're just getting a different remote server each time?
In any case, having dehydrated be able to retry with the new Nonce each time would be better, but perhaps this is a problem with Boulder itself?
@yverry commented on GitHub (Jul 14, 2018):
From my side when I've added
IP_VERSION=6nonce error disappeared@neoKushan commented on GitHub (Oct 30, 2018):
Just to chime in, as I encountered this issue on a completely unrelated system and Googling brought me here.
This issue is essentially caused by LE being unable to get the ACME challenge from the specified domain name. It's clearly not as simple as DNS not being set up correctly, as it's more nuanced than this.
A lot of the people in this thread have found out that when you have multiple IP addresses, they don't always route to the same endpoint. Likewise if you're on a shared IP of any kind, there's no way to guarantee that you'll get the right host either. This is why a lot of people setting IP_VERSION=6 or IP_VERSION=4 "fixes" the issue, it's simply removing the "other" IP Addresses. Essentially, it boils down to your local configuration/network/setup and that's why there's no single thing that will "fix" it.
In my case, IP addresses weren't the issue but rather a redirect was redirecting .well-known incorrectly, causing it to return a 200 with content, just not the content of the ACME challenge - hence "bad nonce". Had it returned a 404, you'd have got the much more useful error that contains the link to the renewal failure report.
I was able to figure this out by simply trying to navigate to .com/.well-known/acme-challenge/ - it should return the nonce directly and not anything else.
To sum up, if you're getting this error:
@znerol commented on GitHub (Dec 5, 2018):
I'm running
dehydratedas part of an integration-test on Travis. I did run into this issue since some of their test workers are behind a NAT. First thing i tried was to find forward proxy software which implements connection pooling/reuse for HTTPS. The closest thing I've found is some adventurous nginx/lua approach.I ended up tunneling all
curlrequests throughtorwhen running thedehydratedtest on Travis. This might not be acceptable in production though.@staples1347 commented on GitHub (Jun 10, 2019):
I am getting this error quite a lot with IPv6. My server is using the same static IP when sending, but I noticed with tcpdump curl seems to alternate between two destinations for acme-v02.api.letsencrypt.org: 2600:1415:8:185::3a8e , 2600:1415:8:192::3a8e which may be causing problems if the backend servers aren't synchronising properly. If I put in a single entry in my hosts file, I don't seem to get the error as often. IPv4 is reliable, but dns normally only returns one ip address for acme-v02.api.letsencrypt.org. Should I report this to Let's Encrypt since using the new nonce might be invalidated when curl connects to the other remote server again?
@staples1347 commented on GitHub (Jun 10, 2019):
Actually my problem might just be with my connection as when I try on other Linux servers using the same two destinations I'm not getting the error.
@alexzorin commented on GitHub (Sep 10, 2019):
ACME clients are supposed to transparently retry requests that fail due to an invalid nonce. This is explicitly mentioned in the spec (https://tools.ietf.org/html/rfc8555#section-6.5):
Whether or not this is caused by NAT, multiple IP addresses, or server-side goings on, users should not even notice that it is happening. You can look at clients like acme.sh or Certbot to see how they handle this.
@m-a-v commented on GitHub (Sep 27, 2019):
I've had the same problem. After updating from 0.6.3 to 0.6.5 the problem disappeared. Probably this helps someone else.
@altasnet commented on GitHub (Oct 2, 2019):
I'm getting error when I try to register and accept terms. I've already check that I'm using just one IP address. I'm using version 0.6.5
dehydrated configuration
INFO: Using main config file /shared/dehydrated/config
declare -- CA="https://acme-staging.api.letsencrypt.org/directory"
declare -- CERTDIR="/shared/dehydrated/certs"
declare -- ALPNCERTDIR="/shared/dehydrated/alpn-certs"
declare -- CHALLENGETYPE="http-01"
declare -- DOMAINS_D=""
declare -- DOMAINS_TXT="/shared/dehydrated/domains.txt"
declare -- HOOK="/shared/dehydrated/hook.sh"
declare -- HOOK_CHAIN="no"
declare -- RENEW_DAYS="30"
declare -- KEYSIZE="2048"
declare -- WELLKNOWN="/shared/dehydrated"
declare -- PRIVATE_KEY_RENEW="yes"
declare -- OPENSSL_CNF="/etc/pki/tls/openssl.cnf"
declare -- CONTACT_EMAIL=""
declare -- LOCKFILE="/shared/dehydrated/lock"
INFO: Using main config file /shared/dehydrated/config
Dehydrated by Lukas Schauer
https://dehydrated.io
Dehydrated version: 0.6.5
GIT-Revision: unknown
OS: BIG-IP 14.1.0.3 Build 0.0.6
Used software:
bash: 4.2.46(2)-release
curl: curl 7.47.1
awk: GNU Awk 4.0.2
sed: sed (GNU sed) 4.2.2
mktemp: mktemp (GNU coreutils) 8.22
grep: grep (GNU grep) 2.20
diff: diff (GNU diffutils) 3.3
openssl: OpenSSL 1.0.2p-fips 14 Aug 2018
INFO: Using main config file /shared/dehydrated/config
Details:
HTTP/2.0 400
server:nginx
date:Wed, 02 Oct 2019 21:15:32 GMT
content-type:application/problem+json
content-length:100
cache-control:public, max-age=0, no-cache
replay-nonce:0002_xoIBQHkeneUKKhLjCGvLu2pNl-Me7aP-dTwuVkTtBU
{
"type": "urn:acme:error:badNonce",
"detail": "JWS has no anti-replay nonce",
"status": 400
}
Error registering account key. See message above for more information.
@KamilKeski commented on GitHub (Oct 3, 2019):
@altasnet Noticed you arent declaring "CA_TERMS", that's going to be required for the correct environment (staging or Prod) to register a new account. You are using the Staging cert authority, may be defaulting to prod license terms if not defined and generating an invalid nonce for that reason. Just a thought.
declare -- CA_TERMS="https://acme-staging.api.letsencrypt.org/terms"
@altasnet commented on GitHub (Oct 3, 2019):
Thank you for your time!
We get the same error in production:
Details:
HTTP/2.0 400
server:nginx
date:Thu, 03 Oct 2019 13:04:33 GMT
content-type:application/problem+json
content-length:112
cache-control:public, max-age=0, no-cache
link:https://acme-v02.api.letsencrypt.org/directory;rel="index"
replay-nonce:0002JNQGAJOKMNtHGIDom3Mth9pEqsTPh7C3_zivlpEyN2k
{
"type": "urn:ietf:params:acme:error:badNonce",
"detail": "JWS has no anti-replay nonce",
"status": 400
}
@Chupaka commented on GitHub (Oct 3, 2019):
@altasnet just a note:
"JWS has no anti-replay nonce"
and
"JWS has invalid anti-replay nonce"
are different errors.
@altasnet commented on GitHub (Oct 3, 2019):
I didnt get invalid anti-replay, its always no anti-replay.
Do you have any idea what could it be?
@javimox commented on GitHub (Oct 4, 2019):
Same here:
@timdev commented on GitHub (Oct 21, 2019):
FWIW, I encountered "JWS has no anti-replay nonce" today. Eventually stumbled upon this thread, and solved the issue on my machine by adding
CURL_OPTS="--http1.1"to my dehydrated config file.@lukas2511 commented on GitHub (Dec 10, 2020):
This may be magically "fixed" when dehydrated at some points gets retry logic. Until then please just fix your network configuration.