[Bug] (SSH) Policy does not select any of the two when duplicate users are present #1103

Closed
opened 2025-12-29 02:28:17 +01:00 by adam · 2 comments
Owner

Originally created by @almereyda on GitHub (Sep 23, 2025).

Is this a support request?

  • This is not a support request

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When upgrading from v0.23.* to v0.26.* in a row, a non-optional migration will not be applied, due to having been removed inbetween versions with #2411.

The release notes contained messages about the coming and going of the feature gate in very short succession, but did not mark the changes as breaking.

Expected Behavior

Users are not duplicated and all policies using user@ or group:name statements continue to work as is.

Potentially breaking changes are marked as such in next to the others in the release notes.

Steps To Reproduce

  1. Upgrade v0.23.z to v0.26.1
  2. Find multiple users for yourself
  3. Use a user based src for the SSH policy and see it not working.
  4. Use a group based src for the SSH policy and see it not working.
  5. Use only tags for src in the SSH policy and see it working.

Environment

- OS: many
- Headscale version: v0.26.1
- Tailscale version: many, > 1.62

Runtime environment

  • Headscale is behind a (reverse) proxy
  • Headscale runs in a container

Debug information

tailscale status --json | jq '.User.[] | select(.LoginName | contains("yala"))'
{
  "ID": 2,
  "LoginName": "yala",
  "DisplayName": "yala"
}
{
  "ID": 7,
  "LoginName": "yala@example.org",
  "DisplayName": "Jon Richter",
  "ProfilePicURL": "https://secure.gravatar.com/avatar/ad34a18e88b8641abb52887a530ccbd0154da1e120b271b36f553bdbe4d174e0?s=80&d=identicon"
}
$ tailscale status | grep yala | awk '{print $3}' | sort -u
yala
yala@

  1. Policy trying to select the src per user
{
  "groups": {
    "group:yala": ["yala@"],
  },
  "tagOwners": {
    "tag:yala": ["group:yala"],
  },
  "acls": [
    // Allow all connections.
    { "action": "accept", "src": ["*"], "dst": ["*:*"] },
  ],
  "ssh": [
    {
      "action": "accept",
      "src": ["yala@"],
      "dst": ["tag:yala"],
      "users": ["root", "ubuntu", "pi", "yala"]
    }
  ],
  "disableIPv4":false,
  "randomizeClientPort":false
}
  1. Policy selecting by group
{
  "groups": {
    "group:yala": ["yala@"],
  },
  "tagOwners": {
    "tag:yala": ["group:yala"],
  },
  "acls": [
    // Allow all connections.
    { "action": "accept", "src": ["*"], "dst": ["*:*"] },
  ],
  "ssh": [
    {
      "action": "accept",
      "src": ["group:yala"],
      "dst": ["tag:yala"],
      "users": ["root", "ubuntu", "pi", "yala"]
    }
  ],
  "disableIPv4":false,
  "randomizeClientPort":false
}
  1. Policy with reduntant tag
{
  "groups": {
    "group:yala": ["yala@"],
  },
  "tagOwners": {
    "tag:yala0": ["group:yala"],
    "tag:yala": ["group:yala"],
  },
  "acls": [
    // Allow all connections.
    { "action": "accept", "src": ["*"], "dst": ["*:*"] },
  ],
  "ssh": [
    {
      "action": "accept",
      "src": ["tag:yala0"],
      "dst": ["tag:yala"],
      "users": ["root", "ubuntu", "pi", "yala"]
    }
  ],
  "disableIPv4":false,
  "randomizeClientPort":false
}

All of this despite using the redundant tag will under the given circumstances of duplicate old and new users lead to empty rules[n].principals on /debug/ssh/. I don't know if it's correllation or causation in the statement. I have not tried to migrate most nodes from the old user to the new one manually, as #2417 kept me busy as well.

Moving a single device and assigning it a new user id together with using the tag-workaround helped accessing it again, though. The single step to only reassign the user to the OIDC identity with @ was not sufficient, possibly due to the policy not resolving yala@ to any of the two fitting identities (this issue).

This is close to:

and most likely caused by

following the resolution of

The closing message of that issue states:

I'm going to close this as most likely an issue of skipping versions which had changelog notes about step by step migration.

The need for special care in upgrading from 0.23 to 0.24, 0.25 and subsequently 0.26 was not marked as a breaking change, but communicated well:

Headscale v0.24.0 has an automatic migration feature, which is enabled by default (map_legacy_users: true). This will be disabled by default in a future version of Headscale – any unmigrated users will get new accounts.

Please note that map_legacy_users will be set to false by default in v0.25.0 and the migration mechanism will be removed in v0.26.0.

It was not perfectly clear from this announcement, that this might have other unintended side effects, such as breaking the selection of principals in a (v2) policy.

v0.25 and v0.26 also don't mark the procedure as breaking, but list the contributions under Changes without explaining their impact.

  • v0.25 "oidc.map_legacy_users is now false by default"
  • v0.26 "oidc.map_legacy_users and oidc.strip_email_domain has been removed"

In short: the migration period was very short (2025-01-17 to 2025-02-11) and the indication for a necessary version-by-version migration from v0.23 to v0.26 via v0.24 or v0.25 with manually setting the variable was easily missed.


Please don't get me wrong, I like that we have so many releases in such short intervals. I'm just asking for empathy and understanding that it was easy to miss the need for the migration, as it hadn't been included in the existing Breaking section never, where people will put and will focus their attention under general daily pressure to maintain many systems.


The question is this:

How can people who ended up in the unmigrated state on a newer version replay the removed migration in order to settle their systems into a more consolidated state?

Originally created by @almereyda on GitHub (Sep 23, 2025). ### Is this a support request? - [x] This is not a support request ### Is there an existing issue for this? - [x] I have searched the existing issues ### Current Behavior When upgrading from v0.23.* to v0.26.* in a row, a non-optional migration will not be applied, due to having been removed inbetween versions with #2411. The release notes contained messages about the coming and going of the feature gate in very short succession, but did not mark the changes as breaking. ### Expected Behavior Users are not duplicated and all policies using `user@` or `group:name` statements continue to work as is. Potentially breaking changes are marked as such in next to the others in the release notes. ### Steps To Reproduce 1. Upgrade v0.23.z to v0.26.1 2. Find multiple users for yourself 3. Use a user based `src` for the SSH policy and see it not working. 4. Use a group based `src` for the SSH policy and see it not working. 5. Use only tags for `src` in the SSH policy and see it working. ### Environment ```markdown - OS: many - Headscale version: v0.26.1 - Tailscale version: many, > 1.62 ``` ### Runtime environment - [x] Headscale is behind a (reverse) proxy - [x] Headscale runs in a container ### Debug information ```sh tailscale status --json | jq '.User.[] | select(.LoginName | contains("yala"))' ``` ```json { "ID": 2, "LoginName": "yala", "DisplayName": "yala" } { "ID": 7, "LoginName": "yala@example.org", "DisplayName": "Jon Richter", "ProfilePicURL": "https://secure.gravatar.com/avatar/ad34a18e88b8641abb52887a530ccbd0154da1e120b271b36f553bdbe4d174e0?s=80&d=identicon" } ``` ```sh $ tailscale status | grep yala | awk '{print $3}' | sort -u yala yala@ ``` --- 1. Policy trying to select the `src` per user ```json { "groups": { "group:yala": ["yala@"], }, "tagOwners": { "tag:yala": ["group:yala"], }, "acls": [ // Allow all connections. { "action": "accept", "src": ["*"], "dst": ["*:*"] }, ], "ssh": [ { "action": "accept", "src": ["yala@"], "dst": ["tag:yala"], "users": ["root", "ubuntu", "pi", "yala"] } ], "disableIPv4":false, "randomizeClientPort":false } ``` 2. Policy selecting by group ```json { "groups": { "group:yala": ["yala@"], }, "tagOwners": { "tag:yala": ["group:yala"], }, "acls": [ // Allow all connections. { "action": "accept", "src": ["*"], "dst": ["*:*"] }, ], "ssh": [ { "action": "accept", "src": ["group:yala"], "dst": ["tag:yala"], "users": ["root", "ubuntu", "pi", "yala"] } ], "disableIPv4":false, "randomizeClientPort":false } ``` 3. Policy with reduntant tag ```json { "groups": { "group:yala": ["yala@"], }, "tagOwners": { "tag:yala0": ["group:yala"], "tag:yala": ["group:yala"], }, "acls": [ // Allow all connections. { "action": "accept", "src": ["*"], "dst": ["*:*"] }, ], "ssh": [ { "action": "accept", "src": ["tag:yala0"], "dst": ["tag:yala"], "users": ["root", "ubuntu", "pi", "yala"] } ], "disableIPv4":false, "randomizeClientPort":false } ``` --- All of this despite using the redundant tag will under the given circumstances of duplicate old and new users lead to empty `rules[n].principals` on /debug/ssh/. I don't know if it's correllation or causation in the statement. I have not tried to migrate most nodes from the old user to the new one manually, as #2417 kept me busy as well. Moving a single device and assigning it a new user id together with using the tag-workaround helped accessing it again, though. The single step to only reassign the user to the OIDC identity with `@` was not sufficient, possibly due to the policy not resolving `yala@` to any of the two fitting identities (this issue). This is close to: - #2641 - #2674 and most likely caused by - #2411 following the resolution of - #2651 The closing message of that issue states: > I'm going to close this as most likely an issue of skipping versions which had changelog notes about step by step migration. The need for special care in upgrading from 0.23 to 0.24, 0.25 and subsequently 0.26 was not marked as a breaking change, but communicated well: > Headscale [v0.24.0](https://github.com/juanfont/headscale/releases/tag/v0.24.0) has an automatic migration feature, which is enabled by default (`map_legacy_users: true`). This will be disabled by default in a future version of Headscale – any unmigrated users will get new accounts. > > … > > Please note that `map_legacy_users` will be set to `false` by default in v0.25.0 and the migration mechanism will be removed in v0.26.0. It was not perfectly clear from this announcement, that this might have other unintended side effects, such as breaking the selection of principals in a (v2) policy. v0.25 and v0.26 also don't mark the procedure as breaking, but list the contributions under Changes without explaining their impact. - [v0.25](https://github.com/juanfont/headscale/releases/tag/v0.25.0) "`oidc.map_legacy_users` is now `false` by default" - [v0.26](https://github.com/juanfont/headscale/releases/tag/v0.26.0) "`oidc.map_legacy_users` and `oidc.strip_email_domain` has been removed" In short: the migration period was very short (2025-01-17 to 2025-02-11) and the indication for a necessary version-by-version migration from v0.23 to v0.26 via v0.24 or v0.25 with manually setting the variable was easily missed. --- Please don't get me wrong, I like that we have so many releases in such short intervals. I'm just asking for empathy and understanding that it was easy to miss the need for the migration, as it hadn't been included in the existing Breaking section never, where people will put and will focus their attention under general daily pressure to maintain many systems. --- The question is this: How can people who ended up in the unmigrated state on a newer version replay the removed migration in order to settle their systems into a more consolidated state?
adam added the bug label 2025-12-29 02:28:17 +01:00
adam closed this issue 2025-12-29 02:28:17 +01:00
Author
Owner

@almereyda commented on GitHub (Sep 23, 2025):

Due to the other side-effects around #2417, this one took a month to triage, debug and understand. Please excuse the delay in trying to keep up with the high-paced development of Headscale.

@almereyda commented on GitHub (Sep 23, 2025): Due to the other side-effects around #2417, this one took a month to triage, debug and understand. Please excuse the delay in trying to keep up with the high-paced development of Headscale.
Author
Owner

@almereyda commented on GitHub (Dec 16, 2025):

A bug that follows from a bug during an unclean transition path.

With the migration path being clear and recovery being possible through different means, this can be closed.

It can be resolved manually with some renaming and remapping of identities, either cleanly via the CLI or directly in the database for the adventurous.

@almereyda commented on GitHub (Dec 16, 2025): A bug that follows from a bug during an unclean transition path. With the migration path being clear and recovery being possible through different means, this can be closed. It can be resolved manually with some renaming and remapping of identities, either cleanly via the CLI or directly in the database for the adventurous.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#1103