[PR #2628] [MERGED] mapper: produce map before poll #2774

Closed
opened 2025-12-29 04:18:55 +01:00 by adam · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/juanfont/headscale/pull/2628
Author: @kradalby
Created: 5/26/2025
Status: Merged
Merged: 7/28/2025
Merged by: @kradalby

Base: mainHead: kradalby/mapper-move


📝 Commits (6)

  • 61c8423 cmd/hi: add ability to collect resource stats
  • 392cda3 .github/workflows: fail tests if hs or ts is using too much memory
  • 30036cb .gitignore: .claude
  • 8c04ab7 .github/workflows: add generate check
  • 7177aeb CLAUDE.md: add project file, lets call it a test
  • 273dfd6 mapper: produce map before poll

📊 Changes

70 files changed (+5756 additions, -2460 deletions)

View changed files

📝 .dockerignore (+4 -0)
.github/workflows/check-generated.yml (+55 -0)
📝 .github/workflows/integration-test-template.yml (+1 -1)
📝 .gitignore (+7 -0)
📝 CHANGELOG.md (+2 -0)
CLAUDE.md (+395 -0)
📝 Makefile (+3 -4)
📝 cmd/headscale/cli/users.go (+0 -3)
📝 cmd/hi/docker.go (+69 -2)
📝 cmd/hi/doctor.go (+27 -4)
📝 cmd/hi/run.go (+3 -0)
cmd/hi/stats.go (+468 -0)
📝 flake.nix (+1 -1)
📝 go.mod (+16 -11)
📝 go.sum (+18 -16)
📝 hscontrol/app.go (+45 -85)
📝 hscontrol/auth.go (+48 -35)
📝 hscontrol/capver/capver.go (+3 -1)
📝 hscontrol/capver/capver_generated.go (+15 -16)
📝 hscontrol/capver/capver_test.go (+3 -4)

...and 50 more files

📄 Description

Before this patch, we would send a message to each "node stream"
that there is an update that needs to be turned into a mapresponse
and sent to a node.

Producing the mapresponse is a "costly" afair which means that while
a node was producing one, it might start blocking and creating full
queues from the poller and all the way up to where updates where sent.

This could cause updates to time out and being dropped as a bad node
going away or spending too time processing would cause all the other
nodes to not get any updates.

In addition, it contributed to "uncontrolled parallel processing" by
potentially doing too many expensive operations at the same time:

Each node stream is essentially a channel, meaning that if you have 30
nodes, we will try to process 30 map requests at the same time. If you
have 8 cpu cores, that will saturate all the cores immediately and cause
a lot of wasted switching between the processing.

Now, all the maps are processed by workers in the mapper, and the number
of workers are controlable. These would now be recommended to be a bit
less than number of CPU cores, allowing us to process them as fast as we
can, and then send them to the poll.

When the poll received the map, it is only responsible for taking it and
sending it to the node.

This might not directly improve the performance of Headscale, but it will
likely make the performance a lot more consistent. And I would argue the
design is a lot easier to reason about.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/juanfont/headscale/pull/2628 **Author:** [@kradalby](https://github.com/kradalby) **Created:** 5/26/2025 **Status:** ✅ Merged **Merged:** 7/28/2025 **Merged by:** [@kradalby](https://github.com/kradalby) **Base:** `main` ← **Head:** `kradalby/mapper-move` --- ### 📝 Commits (6) - [`61c8423`](https://github.com/juanfont/headscale/commit/61c842361c13a7b9385769fd93cd28775baaea12) cmd/hi: add ability to collect resource stats - [`392cda3`](https://github.com/juanfont/headscale/commit/392cda3afbe24add84ce83edbd8065c1d4ac96fc) .github/workflows: fail tests if hs or ts is using too much memory - [`30036cb`](https://github.com/juanfont/headscale/commit/30036cb80a58d8f068a1f618b2fe539afc7d52f9) .gitignore: .claude - [`8c04ab7`](https://github.com/juanfont/headscale/commit/8c04ab7e445b7cb668d6c881b09e40f82484124f) .github/workflows: add generate check - [`7177aeb`](https://github.com/juanfont/headscale/commit/7177aebd37ea9681f08b8c4bcfaa67d514dedc27) CLAUDE.md: add project file, lets call it a test - [`273dfd6`](https://github.com/juanfont/headscale/commit/273dfd678f030c85b1121c424dd29dca29556040) mapper: produce map before poll ### 📊 Changes **70 files changed** (+5756 additions, -2460 deletions) <details> <summary>View changed files</summary> 📝 `.dockerignore` (+4 -0) ➕ `.github/workflows/check-generated.yml` (+55 -0) 📝 `.github/workflows/integration-test-template.yml` (+1 -1) 📝 `.gitignore` (+7 -0) 📝 `CHANGELOG.md` (+2 -0) ➕ `CLAUDE.md` (+395 -0) 📝 `Makefile` (+3 -4) 📝 `cmd/headscale/cli/users.go` (+0 -3) 📝 `cmd/hi/docker.go` (+69 -2) 📝 `cmd/hi/doctor.go` (+27 -4) 📝 `cmd/hi/run.go` (+3 -0) ➕ `cmd/hi/stats.go` (+468 -0) 📝 `flake.nix` (+1 -1) 📝 `go.mod` (+16 -11) 📝 `go.sum` (+18 -16) 📝 `hscontrol/app.go` (+45 -85) 📝 `hscontrol/auth.go` (+48 -35) 📝 `hscontrol/capver/capver.go` (+3 -1) 📝 `hscontrol/capver/capver_generated.go` (+15 -16) 📝 `hscontrol/capver/capver_test.go` (+3 -4) _...and 50 more files_ </details> ### 📄 Description Before this patch, we would send a message to each "node stream" that there is an update that needs to be turned into a mapresponse and sent to a node. Producing the mapresponse is a "costly" afair which means that while a node was producing one, it might start blocking and creating full queues from the poller and all the way up to where updates where sent. This could cause updates to time out and being dropped as a bad node going away or spending too time processing would cause all the other nodes to not get any updates. In addition, it contributed to "uncontrolled parallel processing" by potentially doing too many expensive operations at the same time: Each node stream is essentially a channel, meaning that if you have 30 nodes, we will try to process 30 map requests at the same time. If you have 8 cpu cores, that will saturate all the cores immediately and cause a lot of wasted switching between the processing. Now, all the maps are processed by workers in the mapper, and the number of workers are controlable. These would now be recommended to be a bit less than number of CPU cores, allowing us to process them as fast as we can, and then send them to the poll. When the poll received the map, it is only responsible for taking it and sending it to the node. This might not directly improve the performance of Headscale, but it will likely make the performance a lot more consistent. And I would argue the design is a lot easier to reason about. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
adam added the pull-request label 2025-12-29 04:18:55 +01:00
adam closed this issue 2025-12-29 04:18:55 +01:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#2774