zerolog panics "index out of range" #550

Closed
opened 2025-12-29 02:19:53 +01:00 by adam · 3 comments
Owner

Originally created by @kev-the-dev on GitHub (Sep 5, 2023).

Bug description

I've been seeing crashes related to the logger in Headscale with the following stack traces (line counts may be slightly different due to patches I'm running).

It seems that zerolog can encounter a race condition when multiple Go routines use the same Msg around the same time. See https://github.com/rs/zerolog/issues/283. There may be many places in Headscale code where this race condition is exposed...

12:20PM TRC Closing updateChan channel channel=Done handler=PollNetMap machine=<redacted>
panic: runtime error: index out of range [-1]

goroutine 463766 [running]:
github.com/rs/zerolog/internal/json.Encoder.AppendKey(...)
        external/com_github_rs_zerolog/internal/json/base.go:15
github.com/rs/zerolog.(*Event).msg(0xc0002a18c0, {0xc002472240, 0x1d})
        external/com_github_rs_zerolog/event.go:144 +0x2f3
github.com/rs/zerolog.(*Event).Msg(...)
        external/com_github_rs_zerolog/event.go:108
github.com/juanfont/headscale.closeChanWithLog[...](0x563a95a03de0?, {0xc000c8f5f0?, 0x10}, {0x563a95be61f2, 0xd})
        external/com_github_juanfont_headscale/protocol_common_poll.go:696 +0x10c
github.com/juanfont/headscale.(*Headscale).scheduledPollWorker(0xc000278000, {0x563a95dadb50, _}, _, _, {0x3d, {0xc002f14464, 0x4}, 0x1, {{0xe8, ...}}, ...}, ...)
        external/com_github_juanfont_headscale/protocol_common_poll.go:649 +0x767
created by github.com/juanfont/headscale.(*Headscale).pollNetMapStream
        external/com_github_juanfont_headscale/protocol_common_poll.go:261 +0x2a6

goroutine 388590 [running]:
golang.org/x/net/http2.(*serverConn).runHandler.func1()
        external/org_golang_x_net/http2/server.go:2307 +0x145
panic({0x563a95b63ca0, 0xc001d39e78})
        GOROOT/src/runtime/panic.go:884 +0x213
github.com/rs/zerolog/internal/json.Encoder.AppendKey(...)
        external/com_github_rs_zerolog/internal/json/base.go:15
github.com/rs/zerolog.(*Event).Str(0xc0002a18c0?, {0x563a95bdfe23?, 0x0?}, {0xc002f02320?, 0x10?})
        external/com_github_rs_zerolog/event.go:254 +0x13d
github.com/juanfont/headscale.(*Headscale).pollNetMapStream(0xc000278000, {_, _}, {_, _}, _, {0x3d, {0xc0031759c4, 0x4}, 0x1, ...}, ...)
        external/com_github_juanfont_headscale/protocol_common_poll.go:577 +0x2546
github.com/juanfont/headscale.(*Headscale).handlePollCommon(0xc000278000, {0x563a95dac670, _}, {_, _}, _, {0x3d, {0xc0031759c4, 0x4}, 0x1, ...}, ...)
        external/com_github_juanfont_headscale/protocol_common_poll.go:223 +0x1257
github.com/juanfont/headscale.(*ts2021App).NoisePollNetMapHandler(0xc002711c90, {0x563a95dac670, 0xc0031a1f80}, 0xc0002aed00)
        external/com_github_juanfont_headscale/protocol_noise_poll.go:71 +0x35c
net/http.HandlerFunc.ServeHTTP(0xc0002aec00?, {0x563a95dac670?, 0xc0031a1f80?}, 0x1?)
        GOROOT/src/net/http/server.go:2122 +0x2f
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0015e89c0, {0x563a95dac670, 0xc0031a1f80}, 0xc0015c4500)
        external/com_github_gorilla_mux/mux.go:210 +0x1cf
golang.org/x/net/http2.(*serverConn).runHandler(0xc0003717a0?, 0x0?, 0xc0018d52c0?, 0x0?)
        external/org_golang_x_net/http2/server.go:2314 +0x83
created by golang.org/x/net/http2.(*serverConn).processHeaders
        external/org_golang_x_net/http2/server.go:2028 +0x68a

Environment

v0.22.1 / a5afe4bd06 with some WIP patches (I don't see a path where these patches affect the bug reported).

  • headscale runs in a container

To Reproduce

Unclear... it seems to be a rare race condition. Run headscale a lot with many nodes for a long period of time.

Originally created by @kev-the-dev on GitHub (Sep 5, 2023). ## Bug description I've been seeing crashes related to the logger in Headscale with the following stack traces (line counts may be slightly different due to patches I'm running). It seems that zerolog can encounter a race condition when multiple Go routines use the same `Msg` around the same time. See https://github.com/rs/zerolog/issues/283. There may be many places in Headscale code where this race condition is exposed... ``` 12:20PM TRC Closing updateChan channel channel=Done handler=PollNetMap machine=<redacted> panic: runtime error: index out of range [-1] goroutine 463766 [running]: github.com/rs/zerolog/internal/json.Encoder.AppendKey(...) external/com_github_rs_zerolog/internal/json/base.go:15 github.com/rs/zerolog.(*Event).msg(0xc0002a18c0, {0xc002472240, 0x1d}) external/com_github_rs_zerolog/event.go:144 +0x2f3 github.com/rs/zerolog.(*Event).Msg(...) external/com_github_rs_zerolog/event.go:108 github.com/juanfont/headscale.closeChanWithLog[...](0x563a95a03de0?, {0xc000c8f5f0?, 0x10}, {0x563a95be61f2, 0xd}) external/com_github_juanfont_headscale/protocol_common_poll.go:696 +0x10c github.com/juanfont/headscale.(*Headscale).scheduledPollWorker(0xc000278000, {0x563a95dadb50, _}, _, _, {0x3d, {0xc002f14464, 0x4}, 0x1, {{0xe8, ...}}, ...}, ...) external/com_github_juanfont_headscale/protocol_common_poll.go:649 +0x767 created by github.com/juanfont/headscale.(*Headscale).pollNetMapStream external/com_github_juanfont_headscale/protocol_common_poll.go:261 +0x2a6 ``` ``` goroutine 388590 [running]: golang.org/x/net/http2.(*serverConn).runHandler.func1() external/org_golang_x_net/http2/server.go:2307 +0x145 panic({0x563a95b63ca0, 0xc001d39e78}) GOROOT/src/runtime/panic.go:884 +0x213 github.com/rs/zerolog/internal/json.Encoder.AppendKey(...) external/com_github_rs_zerolog/internal/json/base.go:15 github.com/rs/zerolog.(*Event).Str(0xc0002a18c0?, {0x563a95bdfe23?, 0x0?}, {0xc002f02320?, 0x10?}) external/com_github_rs_zerolog/event.go:254 +0x13d github.com/juanfont/headscale.(*Headscale).pollNetMapStream(0xc000278000, {_, _}, {_, _}, _, {0x3d, {0xc0031759c4, 0x4}, 0x1, ...}, ...) external/com_github_juanfont_headscale/protocol_common_poll.go:577 +0x2546 github.com/juanfont/headscale.(*Headscale).handlePollCommon(0xc000278000, {0x563a95dac670, _}, {_, _}, _, {0x3d, {0xc0031759c4, 0x4}, 0x1, ...}, ...) external/com_github_juanfont_headscale/protocol_common_poll.go:223 +0x1257 github.com/juanfont/headscale.(*ts2021App).NoisePollNetMapHandler(0xc002711c90, {0x563a95dac670, 0xc0031a1f80}, 0xc0002aed00) external/com_github_juanfont_headscale/protocol_noise_poll.go:71 +0x35c net/http.HandlerFunc.ServeHTTP(0xc0002aec00?, {0x563a95dac670?, 0xc0031a1f80?}, 0x1?) GOROOT/src/net/http/server.go:2122 +0x2f github.com/gorilla/mux.(*Router).ServeHTTP(0xc0015e89c0, {0x563a95dac670, 0xc0031a1f80}, 0xc0015c4500) external/com_github_gorilla_mux/mux.go:210 +0x1cf golang.org/x/net/http2.(*serverConn).runHandler(0xc0003717a0?, 0x0?, 0xc0018d52c0?, 0x0?) external/org_golang_x_net/http2/server.go:2314 +0x83 created by golang.org/x/net/http2.(*serverConn).processHeaders external/org_golang_x_net/http2/server.go:2028 +0x68a ``` <!-- A clear and concise description of what the bug is. Describe the expected bahavior and how it is currently different. If you are unsure if it is a bug, consider discussing it on our Discord server first. --> ## Environment v0.22.1 / a5afe4bd06223f71a44ea7d42e9fb2ba3d178154 with some WIP patches (I don't see a path where these patches affect the bug reported). - [x] headscale runs in a container ## To Reproduce Unclear... it seems to be a rare race condition. Run headscale a lot with many nodes for a long period of time.
adam added the stalebug labels 2025-12-29 02:19:53 +01:00
adam closed this issue 2025-12-29 02:19:53 +01:00
Author
Owner

@github-actions[bot] commented on GitHub (Dec 11, 2023):

This issue is stale because it has been open for 90 days with no activity.

@github-actions[bot] commented on GitHub (Dec 11, 2023): This issue is stale because it has been open for 90 days with no activity.
Author
Owner

@github-actions[bot] commented on GitHub (Dec 19, 2023):

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions[bot] commented on GitHub (Dec 19, 2023): This issue was closed because it has been inactive for 14 days since being marked as stale.
Author
Owner

@irl-segfault commented on GitHub (Aug 29, 2024):

@kev-the-dev did you guys ever figure out what the root cause was? Also running into this, and it doesn't seem like it's due to holding on to zerolog.Event references past a call to Msg. Thanks!

@irl-segfault commented on GitHub (Aug 29, 2024): @kev-the-dev did you guys ever figure out what the root cause was? Also running into this, and it doesn't seem like it's due to holding on to `zerolog.Event` references past a call to `Msg`. Thanks!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#550