[Feature] Expose OIDC server status via monitoring #709

New Issue

adam · 2025-12-29T02:22:43+01:00

adam commented

2025-12-29 02:22:43 +01:00

Originally created by @viq on GitHub (May 15, 2024).

Use case

This would allow easy checking of what headscale thinks of availability of the OIDC server, and to set up alerting based on such.

Description

Currently if the OIDC server is not available, headscale (if configured to do so) will switch to "standard" mode, and emit a message in logs. But if someone doesn't look at logs constantly, the only warning about this will be when registering a new machine instead of getting redirected to OIDC login, one will get the information about commands to run on server.

Headscale already has support for exposing some of internal information to prometheus. It would be great if this information was available there as well.

Contribution

I can write the design doc for this feature
I can contribute this feature

How can it be implemented?

Possibly metrics like headscale_oidc_configured or headscale_oidc_info with some information about the configured OIDC server, if any, and headscale_oidc_functional with 0 or 1 depending what the internal checks show.

Originally created by @viq on GitHub (May 15, 2024). ### Use case This would allow easy checking of what headscale thinks of availability of the OIDC server, and to set up alerting based on such. ### Description Currently if the OIDC server is not available, headscale (if configured to do so) will switch to "standard" mode, and emit a message in logs. But if someone doesn't look at logs constantly, the only warning about this will be when registering a new machine instead of getting redirected to OIDC login, one will get the information about commands to run on server. Headscale already has support for exposing some of internal information to prometheus. It would be great if this information was available there as well. ### Contribution - [X] I can write the design doc for this feature - [X] I can contribute this feature ### How can it be implemented? Possibly metrics like `headscale_oidc_configured` or `headscale_oidc_info` with some information about the configured OIDC server, if any, and `headscale_oidc_functional` with `0` or `1` depending what the internal checks show.

adam added the enhancement no-stale-bot OIDC needs design doc labels 2025-12-29 02:22:43 +01:00

adam commented

2025-12-29 02:22:44 +01:00

@kradalby commented on GitHub (Sep 26, 2024):

I wonder if a /health/oidc endpoint might make more sense than a metric. My rational is:

For a metric to be meaningful, we either need to poll oidc on every scrape, which seems a bit "wrong" and might interfere with other metrics. Alternatively we need a routine that checks all the time.

While querying a health endpoint is more "please check if the oidc is available from your perspective", like what we do for the database here https://github.com/juanfont/headscale/blob/main/hscontrol/handlers.go#L128

What do you think?

@kradalby commented on GitHub (Sep 26, 2024): I wonder if a `/health/oidc` endpoint might make more sense than a metric. My rational is: For a metric to be meaningful, we either need to poll oidc on every scrape, which seems a bit "wrong" and might interfere with other metrics. Alternatively we need a routine that checks all the time. While querying a health endpoint is more "please check if the oidc is available from your perspective", like what we do for the database here https://github.com/juanfont/headscale/blob/main/hscontrol/handlers.go#L128 What do you think?

adam commented

2025-12-29 02:22:44 +01:00

@viq commented on GitHub (Feb 1, 2025):

Sorry for the delay.

Current behaviour is that if the OIDC server is not working, at any point, for any reason, headscale decides that it cannot use OIDC, and requires a restart to change that. I would like to detect and expose that.

I "don't care" what the status of OIDC server is. For that I'd be monitoring the OIDC server itself. There's little point for the scrape to cause headscale to reach out to it. What I do care about is what the internal state of headscale says about the state of OIDC server. Specifically, whether headscale requires a restart to remember that OIDC server exists and is to be used.
I think metrics are easier to have a history of and to alert on. I may be wrong here, but that's my opinion about how to expose and consume this state.

Alternatively we need a routine that checks all the time.

That sounds like something to tie in with the internal state, telling headscale that it can start using the OIDC server again, thus mitigating the need for a restart. Which would also mitigate my need for a metric that would tell me that I need to restart headscale for the OIDC to be usable.

@viq commented on GitHub (Feb 1, 2025): Sorry for the delay. Current behaviour is that if the OIDC server is not working, at any point, for any reason, headscale decides that it cannot use OIDC, and requires a restart to change that. I would like to detect and expose that. 1. I "don't care" what the status of OIDC server is. For that I'd be monitoring the OIDC server itself. There's little point for the scrape to cause headscale to reach out to it. What I *do* care about is what the _internal state_ of headscale says about the state of OIDC server. Specifically, whether headscale requires a restart to remember that OIDC server exists and is to be used. 2. I think metrics are easier to have a history of and to alert on. I may be wrong here, but that's my opinion about how to expose and consume this state. > Alternatively we need a routine that checks all the time. That sounds like something to tie in with the internal state, telling headscale that it can start using the OIDC server again, thus mitigating the need for a restart. Which would also mitigate my need for a metric that would tell me that I need to restart headscale for the OIDC to be usable.

adam referenced this issue

2025-12-29 02:30:50 +01:00

[PR #709] [CLOSED] docs(README): update contributors #1587

Sign in to join this conversation.

Branches Tags

main

gh-pages

update_flake_lock_action

kradalby/3038-reg-panic

kradalby/release-v0.27.2

dependabot/go_modules/golang.org/x/crypto-0.45.0

dependabot/go_modules/github.com/opencontainers/runc-1.3.3

copilot/investigate-headscale-issue-2788

copilot/investigate-visibility-issue-2788

copilot/investigate-issue-2833

copilot/debug-issue-2846

copilot/fix-issue-2847

dependabot/go_modules/github.com/go-viper/mapstructure/v2-2.4.0

dependabot/go_modules/github.com/docker/docker-28.3.3incompatible

kradalby/cli-experiement3

doc/0.26.1

doc/0.25.1

doc/0.25.0

doc/0.24.3

doc/0.24.2

doc/0.24.1

doc/0.24.0

kradalby/build-docker-on-pr

topic/docu-versioning

topic/docker-kos

juanfont/fix-crash-node-id

juanfont/better-disclaimer

update-contributors

topic/prettier

revert-1893-add-test-stage-to-docs

add-test-stage-to-docs

remove-node-check-interval

fix-empty-prefix

fix-ephemeral-reusable

bug_report-debuginfo

autogroups

logs-to-stderr

revert-1414-topic/fix_unix_socket

rename-machine-node

port-embedded-derp-tests-v2

port-derp-tests

duplicate-word-linter

update-tailscale-1.36

warn-against-apache

ko-fi-link

more-acl-tests

fix-typo-standalone

parallel-nolint

tparallel-fix

rerouting

ssh-changelog-docs

oidc-cleanup

web-auth-flow-tests

kradalby-gh-runner

fix-proto-lint

remove-funding-links

go-1.19

enable-1.30-in-tests

0.16.x

cosmetic-changes-integration

tmp-fix-integration-docker

fix-integration-docker

configurable-update-interval

show-nodes-online

hs2021

acl-syntax-fixes

ts2021-implementation

fix-spurious-updates

unstable-integration-tests

mandatory-stun

embedded-derp

prtemplate-fix

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/headscale#709