[PR #2628] [MERGED] mapper: produce map before poll #2774

New Issue

adam · 2025-12-29T04:18:55+01:00

adam commented

2025-12-29 04:18:55 +01:00

📋 Pull Request Information

Original PR: https://github.com/juanfont/headscale/pull/2628
Author: @kradalby
Created: 5/26/2025
Status: ✅ Merged
Merged: 7/28/2025
Merged by: @kradalby

Base: main ← Head: kradalby/mapper-move

📝 Commits (6)

61c8423 cmd/hi: add ability to collect resource stats
392cda3 .github/workflows: fail tests if hs or ts is using too much memory
30036cb .gitignore: .claude
8c04ab7 .github/workflows: add generate check
7177aeb CLAUDE.md: add project file, lets call it a test
273dfd6 mapper: produce map before poll

📊 Changes

70 files changed (+5756 additions, -2460 deletions)

View changed files

📝 .dockerignore (+4 -0)
➕ .github/workflows/check-generated.yml (+55 -0)
📝 .github/workflows/integration-test-template.yml (+1 -1)
📝 .gitignore (+7 -0)
📝 CHANGELOG.md (+2 -0)
➕ CLAUDE.md (+395 -0)
📝 Makefile (+3 -4)
📝 cmd/headscale/cli/users.go (+0 -3)
📝 cmd/hi/docker.go (+69 -2)
📝 cmd/hi/doctor.go (+27 -4)
📝 cmd/hi/run.go (+3 -0)
➕ cmd/hi/stats.go (+468 -0)
📝 flake.nix (+1 -1)
📝 go.mod (+16 -11)
📝 go.sum (+18 -16)
📝 hscontrol/app.go (+45 -85)
📝 hscontrol/auth.go (+48 -35)
📝 hscontrol/capver/capver.go (+3 -1)
📝 hscontrol/capver/capver_generated.go (+15 -16)
📝 hscontrol/capver/capver_test.go (+3 -4)

...and 50 more files

📄 Description

Before this patch, we would send a message to each "node stream"
that there is an update that needs to be turned into a mapresponse
and sent to a node.

Producing the mapresponse is a "costly" afair which means that while
a node was producing one, it might start blocking and creating full
queues from the poller and all the way up to where updates where sent.

This could cause updates to time out and being dropped as a bad node
going away or spending too time processing would cause all the other
nodes to not get any updates.

In addition, it contributed to "uncontrolled parallel processing" by
potentially doing too many expensive operations at the same time:

Each node stream is essentially a channel, meaning that if you have 30
nodes, we will try to process 30 map requests at the same time. If you
have 8 cpu cores, that will saturate all the cores immediately and cause
a lot of wasted switching between the processing.

Now, all the maps are processed by workers in the mapper, and the number
of workers are controlable. These would now be recommended to be a bit
less than number of CPU cores, allowing us to process them as fast as we
can, and then send them to the poll.

When the poll received the map, it is only responsible for taking it and
sending it to the node.

This might not directly improve the performance of Headscale, but it will
likely make the performance a lot more consistent. And I would argue the
design is a lot easier to reason about.

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/juanfont/headscale/pull/2628 **Author:** [@kradalby](https://github.com/kradalby) **Created:** 5/26/2025 **Status:** ✅ Merged **Merged:** 7/28/2025 **Merged by:** [@kradalby](https://github.com/kradalby) **Base:** `main` ← **Head:** `kradalby/mapper-move` --- ### 📝 Commits (6) - [`61c8423`](https://github.com/juanfont/headscale/commit/61c842361c13a7b9385769fd93cd28775baaea12) cmd/hi: add ability to collect resource stats - [`392cda3`](https://github.com/juanfont/headscale/commit/392cda3afbe24add84ce83edbd8065c1d4ac96fc) .github/workflows: fail tests if hs or ts is using too much memory - [`30036cb`](https://github.com/juanfont/headscale/commit/30036cb80a58d8f068a1f618b2fe539afc7d52f9) .gitignore: .claude - [`8c04ab7`](https://github.com/juanfont/headscale/commit/8c04ab7e445b7cb668d6c881b09e40f82484124f) .github/workflows: add generate check - [`7177aeb`](https://github.com/juanfont/headscale/commit/7177aebd37ea9681f08b8c4bcfaa67d514dedc27) CLAUDE.md: add project file, lets call it a test - [`273dfd6`](https://github.com/juanfont/headscale/commit/273dfd678f030c85b1121c424dd29dca29556040) mapper: produce map before poll ### 📊 Changes **70 files changed** (+5756 additions, -2460 deletions) <details> <summary>View changed files</summary> 📝 `.dockerignore` (+4 -0) ➕ `.github/workflows/check-generated.yml` (+55 -0) 📝 `.github/workflows/integration-test-template.yml` (+1 -1) 📝 `.gitignore` (+7 -0) 📝 `CHANGELOG.md` (+2 -0) ➕ `CLAUDE.md` (+395 -0) 📝 `Makefile` (+3 -4) 📝 `cmd/headscale/cli/users.go` (+0 -3) 📝 `cmd/hi/docker.go` (+69 -2) 📝 `cmd/hi/doctor.go` (+27 -4) 📝 `cmd/hi/run.go` (+3 -0) ➕ `cmd/hi/stats.go` (+468 -0) 📝 `flake.nix` (+1 -1) 📝 `go.mod` (+16 -11) 📝 `go.sum` (+18 -16) 📝 `hscontrol/app.go` (+45 -85) 📝 `hscontrol/auth.go` (+48 -35) 📝 `hscontrol/capver/capver.go` (+3 -1) 📝 `hscontrol/capver/capver_generated.go` (+15 -16) 📝 `hscontrol/capver/capver_test.go` (+3 -4) _...and 50 more files_ </details> ### 📄 Description Before this patch, we would send a message to each "node stream" that there is an update that needs to be turned into a mapresponse and sent to a node. Producing the mapresponse is a "costly" afair which means that while a node was producing one, it might start blocking and creating full queues from the poller and all the way up to where updates where sent. This could cause updates to time out and being dropped as a bad node going away or spending too time processing would cause all the other nodes to not get any updates. In addition, it contributed to "uncontrolled parallel processing" by potentially doing too many expensive operations at the same time: Each node stream is essentially a channel, meaning that if you have 30 nodes, we will try to process 30 map requests at the same time. If you have 8 cpu cores, that will saturate all the cores immediately and cause a lot of wasted switching between the processing. Now, all the maps are processed by workers in the mapper, and the number of workers are controlable. These would now be recommended to be a bit less than number of CPU cores, allowing us to process them as fast as we can, and then send them to the poll. When the poll received the map, it is only responsible for taking it and sending it to the node. This might not directly improve the performance of Headscale, but it will likely make the performance a lot more consistent. And I would argue the design is a lot easier to reason about. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

adam added the pull-request label 2025-12-29 04:18:55 +01:00

adam closed this issue

2025-12-29 04:18:55 +01:00

adam referenced this issue

2025-12-29 04:19:22 +01:00

[PR #2774] [MERGED] capver: update latest #2853

Sign in to join this conversation.

Branches Tags

main

update_flake_lock_action

gh-pages

kradalby/3038-reg-panic

kradalby/release-v0.27.2

dependabot/go_modules/golang.org/x/crypto-0.45.0

dependabot/go_modules/github.com/opencontainers/runc-1.3.3

copilot/investigate-headscale-issue-2788

copilot/investigate-visibility-issue-2788

copilot/investigate-issue-2833

copilot/debug-issue-2846

copilot/fix-issue-2847

dependabot/go_modules/github.com/go-viper/mapstructure/v2-2.4.0

dependabot/go_modules/github.com/docker/docker-28.3.3incompatible

kradalby/cli-experiement3

doc/0.26.1

doc/0.25.1

doc/0.25.0

doc/0.24.3

doc/0.24.2

doc/0.24.1

doc/0.24.0

kradalby/build-docker-on-pr

topic/docu-versioning

topic/docker-kos

juanfont/fix-crash-node-id

juanfont/better-disclaimer

update-contributors

topic/prettier

revert-1893-add-test-stage-to-docs

add-test-stage-to-docs

remove-node-check-interval

fix-empty-prefix

fix-ephemeral-reusable

bug_report-debuginfo

autogroups

logs-to-stderr

revert-1414-topic/fix_unix_socket

rename-machine-node

port-embedded-derp-tests-v2

port-derp-tests

duplicate-word-linter

update-tailscale-1.36

warn-against-apache

ko-fi-link

more-acl-tests

fix-typo-standalone

parallel-nolint

tparallel-fix

rerouting

ssh-changelog-docs

oidc-cleanup

web-auth-flow-tests

kradalby-gh-runner

fix-proto-lint

remove-funding-links

go-1.19

enable-1.30-in-tests

0.16.x

cosmetic-changes-integration

tmp-fix-integration-docker

fix-integration-docker

configurable-update-interval

show-nodes-online

hs2021

acl-syntax-fixes

ts2021-implementation

fix-spurious-updates

unstable-integration-tests

mandatory-stun

embedded-derp

prtemplate-fix

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/headscale#2774