Proposal: Implement TS2021 (Tailscale control protocol v2) #249

Closed
opened 2025-12-29 01:24:57 +01:00 by adam · 5 comments
Owner

Originally created by @juanfont on GitHub (Mar 26, 2022).

Tailscale clients communicate with the control server using Tailscale's control protocol. This is what basically Headscale implements.

It is based on a HTTP API with a bit of Long Polling, and a grain of NaCl (https://nacl.cr.yp.to/) to encrypt the JSON payloads. Since 2019 the protocol has remained mostly stable - with just some extra fields being added to support new functionality like MagicDNS or Taildrop.

In our side the core of the implementation is located at api.go (where the registration methods are located) and poll.go (where lies the method that the clients use to receive updates).


A couple of weeks ago Tailscale team let us know (!) that they are working to implement the version 2 of the control protocol, codename TS2021.

They have also been kind enough to a) share some internal documentation on the implementation (!!), and b) release code in https://github.com/tailscale/tailscale that helps us A LOT with the implementation (!!!!!).

They did this in order not to break Headscale. Very very very big kudos to them!

About TS2021

TS2021 is a Noise-based protocol (https://noiseprotocol.org/noise.html), using the IK pattern (https://noiseprotocol.org/noise.html#interactive-handshake-patterns-fundamental). It is the same cryptographic framework as the one used for Signal or Whatsapp.

We will not have to deal with Noise too much. For us, the very first step is a POST call to /ts2021 and an upgrade + hijack of the TCP connection. Then the code I mentioned above quicks in, to create the Noise session. Once this is established, the API is reachable to the clients using what it looks like a H2C server (essentially just the good old v1 API, but without NaCl encryption for the payloads).

From what we can see, as of late March 2022 they have not yet fully migrated all the API methods to use TS2021. So we will have to follow them up gradually.

Our steps

  1. Prepare our API machinery (always wanted to use this word) to be able to deal with clients using TS2019 (no idea if they call it this way) and TS2021. This includes a minor change in the /key method, and removing NaCl for TS2021.

  2. Implement the /ts2021 handler (its quite similar to what we do for the embedded DERP server)

  3. Plug a H2C server to the Noise connection under /ts2021 to expose our current API.

  4. Keep track of their CurrentCapabilityVersion, gradually enabling new API calls under TS2021

Current status

I have a prototype mostly working. I will clean it a little bit an prepare a draft PR for scrutiny.

Originally created by @juanfont on GitHub (Mar 26, 2022). Tailscale clients communicate with the control server using Tailscale's control protocol. This is what basically Headscale implements. It is based on a HTTP API with a bit of Long Polling, and a grain of NaCl (https://nacl.cr.yp.to/) to encrypt the JSON payloads. Since 2019 the protocol has remained mostly stable - with just some extra fields being added to support new functionality like MagicDNS or Taildrop. In our side the core of the implementation is located at `api.go` (where the registration methods are located) and `poll.go` (where lies the method that the clients use to receive updates). ---- A couple of weeks ago Tailscale team let us know (!) that they are working to implement the version 2 of the control protocol, codename TS2021. They have also been kind enough to a) share some internal documentation on the implementation (!!), and b) release code in https://github.com/tailscale/tailscale that helps us _A LOT_ with the implementation (!!!!!). They did this in order not to break Headscale. Very very very big kudos to them! ## About TS2021 TS2021 is a Noise-based protocol (https://noiseprotocol.org/noise.html), using the IK pattern (https://noiseprotocol.org/noise.html#interactive-handshake-patterns-fundamental). It is the same cryptographic framework as the one used for Signal or Whatsapp. We will not have to deal with Noise too much. For us, the very first step is a POST call to `/ts2021` and an upgrade + hijack of the TCP connection. Then the code I mentioned above quicks in, to create the Noise session. Once this is established, the API is reachable to the clients using what it looks like a H2C server (essentially just the good old v1 API, but without NaCl encryption for the payloads). From what we can see, as of late March 2022 they have not yet fully migrated all the API methods to use TS2021. So we will have to follow them up gradually. ## Our steps 1. Prepare our API machinery (always wanted to use this word) to be able to deal with clients using TS2019 (no idea if they call it this way) and TS2021. This includes a minor change in the `/key` method, and removing NaCl for TS2021. 2. Implement the `/ts2021` handler (its quite similar to what we do for the embedded DERP server) 3. Plug a H2C server to the Noise connection under `/ts2021` to expose our current API. 4. Keep track of their `CurrentCapabilityVersion`, gradually enabling new API calls under TS2021 ## Current status I have a prototype _mostly_ working. I will clean it a little bit an prepare a draft PR for scrutiny.
adam added the enhancement label 2025-12-29 01:24:57 +01:00
adam closed this issue 2025-12-29 01:24:57 +01:00
Author
Owner

@danderson commented on GitHub (Mar 27, 2022):

One important thing to note on the cryptography side, which may not be in the docs you got (it was a later implementation question and I'm not sure I backported it into the specs): headscale must generate a new control key for use with Noise, it must not reuse the existing nacl keypair for Noise, even though the keys are technically cross-compatible (both curve25519 keypairs).

This is to avoid cryptographic problems with key reuse across multiple protocols (nacl and noise). Our expert tells us that clients can reuse the same machine key for both protocols (important for compatibility), as long as the control plane uses different keys for nacl and noise.

See https://controlplane.tailscale.com/key?v=27 for how new clients retrieve both keypairs.

Also, if you haven't already, I recommend using the control/controlbase and control/controlhttp packages in the tailscale repo to implement the transport, it takes care of a bunch of the subtleties of upgrading to Noise and handshaking safely. The server-side APIs are also included in those packages.

@danderson commented on GitHub (Mar 27, 2022): One important thing to note on the cryptography side, which may not be in the docs you got (it was a later implementation question and I'm not sure I backported it into the specs): headscale _must_ generate a new control key for use with Noise, it _must not_ reuse the existing nacl keypair for Noise, even though the keys are technically cross-compatible (both curve25519 keypairs). This is to avoid cryptographic problems with key reuse across multiple protocols (nacl and noise). Our expert tells us that clients can reuse the same machine key for both protocols (important for compatibility), as long as the control plane uses different keys for nacl and noise. See https://controlplane.tailscale.com/key?v=27 for how new clients retrieve both keypairs. Also, if you haven't already, I recommend using the control/controlbase and control/controlhttp packages in the tailscale repo to implement the transport, it takes care of a bunch of the subtleties of upgrading to Noise and handshaking safely. The server-side APIs are also included in those packages.
Author
Owner

@juanfont commented on GitHub (Mar 27, 2022):

Hey @danderson,

First, thank you so much for your message! Really appreciated!

And indeed, I was reusing the control key. Even left a comment wondering why you people where using two different keys.

image.

We mostly use controlbase and controlhttp, although I had to modify slightly AcceptHTTP to make Gin (the web framework we use) happy. I also found netutil.NewOneConnListener, which is quite convenient...

Again, thanks for your comment :)

@juanfont commented on GitHub (Mar 27, 2022): Hey @danderson, First, thank you so much for your message! Really appreciated! And indeed, I was reusing the control key. Even left a comment wondering why you people where using two different keys. ![image](https://user-images.githubusercontent.com/181059/160261340-7cc07abb-cc3e-4bdf-a250-f8e563c56bc0.png). We mostly use controlbase and controlhttp, although I had to modify slightly `AcceptHTTP` to make Gin (the web framework we use) happy. I also found `netutil.NewOneConnListener`, which is quite convenient... Again, thanks for your comment :)
Author
Owner

@danderson commented on GitHub (Mar 27, 2022):

Could you @ me when the PR is up, so I can see the AcceptHTTP changes you needed? I'm wondering if we can fix it upstream without importing all of Gin.

@danderson commented on GitHub (Mar 27, 2022): Could you @ me when the PR is up, so I can see the AcceptHTTP changes you needed? I'm wondering if we can fix it upstream without importing all of Gin.
Author
Owner

@danderson commented on GitHub (Apr 7, 2022):

FYI, late change to the Noise protocol: https://github.com/tailscale/tailscale/pull/4370

We now use the client capability version as the Noise handshake version, instead of having a separate version for Noise. That means conn.Version() is the client capability version, and the server-side API changed a little bit to include the max supported protocol version, so the server can validate that it knows how to communicate correctly with a client. Aside from the API change, the controlbase/controlhttp packages handle the Noise internals for handshaking on the correct version, so hopefully not much difference as far as you're concerned.

@danderson commented on GitHub (Apr 7, 2022): FYI, late change to the Noise protocol: https://github.com/tailscale/tailscale/pull/4370 We now use the client capability version as the Noise handshake version, instead of having a separate version for Noise. That means `conn.Version()` is the client capability version, and the server-side API changed a little bit to include the max supported protocol version, so the server can validate that it knows how to communicate correctly with a client. Aside from the API change, the controlbase/controlhttp packages handle the Noise internals for handshaking on the correct version, so hopefully not much difference as far as you're concerned.
Author
Owner

@juanfont commented on GitHub (Aug 23, 2022):

This is done :)

@juanfont commented on GitHub (Aug 23, 2022): This is done :)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#249