[Bug] SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x860100 #910

Closed
opened 2025-12-29 02:25:49 +01:00 by adam · 5 comments
Owner

Originally created by @tacognito on GitHub (Jan 21, 2025).

Is this a support request?

  • This is not a support request

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

After upgrading to 0.24.0 I get panics in headscale like such:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x860100]
goroutine 18595 [running]:
github.com/juanfont/headscale/hscontrol/types.(*Node).Proto(0x0)
	/home/runner/work/headscale/headscale/hscontrol/types/node.go:239 +0x20
github.com/juanfont/headscale/hscontrol/types.Routes.Proto({0x400077bc08?, 0x0?, 0x1cf16f0?})
	/home/runner/work/headscale/headscale/hscontrol/types/routes.go:82 +0xb0
github.com/juanfont/headscale/hscontrol.headscaleV1APIServer.GetRoutes({{}, 0x0?}, {0x19f2180?, 0x1?}, 0x40014101c0?)
	/home/runner/work/headscale/headscale/hscontrol/grpcv1.go:558 +0x3c
github.com/juanfont/headscale/gen/go/headscale/v1._HeadscaleService_GetRoutes_Handler({0x1b9f9e0, 0x4000718328}, {0x1fa5c58, 0x4000c2e3f0}, 0x4001104480, 0x0)
	/home/runner/work/headscale/headscale/gen/go/headscale/v1/headscale_grpc.pb.go:806 +0x1c0
google.golang.org/grpc.(*Server).processUnaryRPC(0x4000680600, {0x1fa5c58, 0x4000c2e390}, 0x4000782420, 0x400070f3b0, 0x30b7678, 0x0)
	/home/runner/go/pkg/mod/google.golang.org/grpc@v1.69.0/server.go:1392 +0xc44
google.golang.org/grpc.(*Server).handleStream(0x4000680600, {0x1fa6418, 0x40001f41a0}, 0x4000782420)
	/home/runner/go/pkg/mod/google.golang.org/grpc@v1.69.0/server.go:1802 +0x910
google.golang.org/grpc.(*Server).serveStreams.func2.1()
	/home/runner/go/pkg/mod/google.golang.org/grpc@v1.69.0/server.go:1030 +0x84
created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 102
	/home/runner/go/pkg/mod/google.golang.org/grpc@v1.69.0/server.go:1041 +0x13c

The tailnet is working, in that sense that I can connect to all nodes as far as i can see. However, on the host system "docker exec headscale headscale nodes list" shows almost all nodes offline. Then, often I cant even use this command because it says the container is restarting. However, the tailnet keeps working. I just pinged a node on the tailnet that seemed offline, but after a single timeout the ping worked and another "docker exec headscale headscale nodes list" showed like half the nodes online. By this time the tailnet was up for minutes.

The this time I tried to get the nodes list I get:

Error response from daemon: Container f56575937eba4f24b50471c1ea0521bc7fecda5705766dc40d11be2c983f048b is restarting, wait until the container is running

Tailnet still working.

Since the golang error message mentions routes, could it be related to #90 ?

How would I check that?

Expected Behavior

No panics. Having a reliable nodes list output. Being able to exec commands in the container.

Steps To Reproduce

Start headscale docker container and check logs.

Environment

- OS: Linux raspberrypi 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr  3 17:24:16 BST 2023 aarch64 GNU/Linux
- Headscale version: 0.24.0
- Tailscale version: many different versions on mac, iphone, windows and linux machines

Runtime environment

  • Headscale is behind a (reverse) proxy
  • Headscale runs in a container

Anything else?

No response

Originally created by @tacognito on GitHub (Jan 21, 2025). ### Is this a support request? - [x] This is not a support request ### Is there an existing issue for this? - [x] I have searched the existing issues ### Current Behavior After upgrading to 0.24.0 I get panics in headscale like such: ``` panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x860100] goroutine 18595 [running]: github.com/juanfont/headscale/hscontrol/types.(*Node).Proto(0x0) /home/runner/work/headscale/headscale/hscontrol/types/node.go:239 +0x20 github.com/juanfont/headscale/hscontrol/types.Routes.Proto({0x400077bc08?, 0x0?, 0x1cf16f0?}) /home/runner/work/headscale/headscale/hscontrol/types/routes.go:82 +0xb0 github.com/juanfont/headscale/hscontrol.headscaleV1APIServer.GetRoutes({{}, 0x0?}, {0x19f2180?, 0x1?}, 0x40014101c0?) /home/runner/work/headscale/headscale/hscontrol/grpcv1.go:558 +0x3c github.com/juanfont/headscale/gen/go/headscale/v1._HeadscaleService_GetRoutes_Handler({0x1b9f9e0, 0x4000718328}, {0x1fa5c58, 0x4000c2e3f0}, 0x4001104480, 0x0) /home/runner/work/headscale/headscale/gen/go/headscale/v1/headscale_grpc.pb.go:806 +0x1c0 google.golang.org/grpc.(*Server).processUnaryRPC(0x4000680600, {0x1fa5c58, 0x4000c2e390}, 0x4000782420, 0x400070f3b0, 0x30b7678, 0x0) /home/runner/go/pkg/mod/google.golang.org/grpc@v1.69.0/server.go:1392 +0xc44 google.golang.org/grpc.(*Server).handleStream(0x4000680600, {0x1fa6418, 0x40001f41a0}, 0x4000782420) /home/runner/go/pkg/mod/google.golang.org/grpc@v1.69.0/server.go:1802 +0x910 google.golang.org/grpc.(*Server).serveStreams.func2.1() /home/runner/go/pkg/mod/google.golang.org/grpc@v1.69.0/server.go:1030 +0x84 created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 102 /home/runner/go/pkg/mod/google.golang.org/grpc@v1.69.0/server.go:1041 +0x13c ``` The tailnet is working, in that sense that I can connect to all nodes as far as i can see. However, on the host system "docker exec headscale headscale nodes list" shows almost all nodes offline. Then, often I cant even use this command because it says the container is restarting. However, the tailnet keeps working. I just pinged a node on the tailnet that seemed offline, but after a single timeout the ping worked and another "docker exec headscale headscale nodes list" showed like half the nodes online. By this time the tailnet was up for minutes. The this time I tried to get the nodes list I get: `Error response from daemon: Container f56575937eba4f24b50471c1ea0521bc7fecda5705766dc40d11be2c983f048b is restarting, wait until the container is running` Tailnet still working. Since the golang error message mentions routes, could it be related to #90 ? How would I check that? ### Expected Behavior No panics. Having a reliable nodes list output. Being able to exec commands in the container. ### Steps To Reproduce Start headscale docker container and check logs. ### Environment ```markdown - OS: Linux raspberrypi 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr 3 17:24:16 BST 2023 aarch64 GNU/Linux - Headscale version: 0.24.0 - Tailscale version: many different versions on mac, iphone, windows and linux machines ``` ### Runtime environment - [x] Headscale is behind a (reverse) proxy - [x] Headscale runs in a container ### Anything else? _No response_
adam added the bug label 2025-12-29 02:25:49 +01:00
adam closed this issue 2025-12-29 02:25:49 +01:00
Author
Owner

@tacognito commented on GitHub (Jan 21, 2025):

Ok yeah, seems the same issue. Had to work out how to check this, but luckily the db setup turned out to be super simple and thus clear.

For others coming across this, here is the sql statement to identify the issue:
select distinct node_id from routes r where r.node_id not in (select id from nodes n)

And here the statement to safely fix it:

delete from routes r where r.node_id not in (select id from nodes n);

All behavior back to normal. However, appearently, since other users experience this. It would be nice to add a check, or even a default "delete ghost routes" procedure at startup. And appearently, the delete node logic does not always delete related routes with it. So I'd say thats a bug, right?

@tacognito commented on GitHub (Jan 21, 2025): Ok yeah, seems the same issue. Had to work out how to check this, but luckily the db setup turned out to be super simple and thus clear. For others coming across this, here is the sql statement to identify the issue: `select distinct node_id from routes r where r.node_id not in (select id from nodes n)` And here the statement to safely fix it: `delete from routes r where r.node_id not in (select id from nodes n);` All behavior back to normal. However, appearently, since other users experience this. It would be nice to add a check, or even a default "delete ghost routes" procedure at startup. And appearently, the delete node logic does not always delete related routes with it. So I'd say thats a bug, right?
Author
Owner

@kradalby commented on GitHub (Jan 22, 2025):

Sounds like same as #2352, will have a look

@kradalby commented on GitHub (Jan 22, 2025): Sounds like same as #2352, will have a look
Author
Owner

@tacognito commented on GitHub (Jan 22, 2025):

Cool, I have a backup of the database that manifested this issue. I can share it if you want to.

@tacognito commented on GitHub (Jan 22, 2025): Cool, I have a backup of the database that manifested this issue. I can share it if you want to.
Author
Owner

@kradalby commented on GitHub (Jan 22, 2025):

That would be great, my email is in my profile

@kradalby commented on GitHub (Jan 22, 2025): That would be great, my email is in my profile
Author
Owner

@tacognito commented on GitHub (Jan 22, 2025):

And you are right, I mentioned the wrong issue number. Had both opened and referenced the wrong one, duh. I used the tip from @strobeltobias to zoom in on the ghost routes.

@tacognito commented on GitHub (Jan 22, 2025): And you are right, I mentioned the wrong issue number. Had both opened and referenced the wrong one, duh. I used the tip from @strobeltobias to zoom in on the ghost routes.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#910