[Feature] Expose OIDC server status via monitoring #709

Open
opened 2025-12-29 02:22:43 +01:00 by adam · 2 comments
Owner

Originally created by @viq on GitHub (May 15, 2024).

Use case

This would allow easy checking of what headscale thinks of availability of the OIDC server, and to set up alerting based on such.

Description

Currently if the OIDC server is not available, headscale (if configured to do so) will switch to "standard" mode, and emit a message in logs. But if someone doesn't look at logs constantly, the only warning about this will be when registering a new machine instead of getting redirected to OIDC login, one will get the information about commands to run on server.

Headscale already has support for exposing some of internal information to prometheus. It would be great if this information was available there as well.

Contribution

  • I can write the design doc for this feature
  • I can contribute this feature

How can it be implemented?

Possibly metrics like headscale_oidc_configured or headscale_oidc_info with some information about the configured OIDC server, if any, and headscale_oidc_functional with 0 or 1 depending what the internal checks show.

Originally created by @viq on GitHub (May 15, 2024). ### Use case This would allow easy checking of what headscale thinks of availability of the OIDC server, and to set up alerting based on such. ### Description Currently if the OIDC server is not available, headscale (if configured to do so) will switch to "standard" mode, and emit a message in logs. But if someone doesn't look at logs constantly, the only warning about this will be when registering a new machine instead of getting redirected to OIDC login, one will get the information about commands to run on server. Headscale already has support for exposing some of internal information to prometheus. It would be great if this information was available there as well. ### Contribution - [X] I can write the design doc for this feature - [X] I can contribute this feature ### How can it be implemented? Possibly metrics like `headscale_oidc_configured` or `headscale_oidc_info` with some information about the configured OIDC server, if any, and `headscale_oidc_functional` with `0` or `1` depending what the internal checks show.
adam added the enhancementno-stale-botOIDCneeds design doc labels 2025-12-29 02:22:43 +01:00
Author
Owner

@kradalby commented on GitHub (Sep 26, 2024):

I wonder if a /health/oidc endpoint might make more sense than a metric. My rational is:

For a metric to be meaningful, we either need to poll oidc on every scrape, which seems a bit "wrong" and might interfere with other metrics. Alternatively we need a routine that checks all the time.

While querying a health endpoint is more "please check if the oidc is available from your perspective", like what we do for the database here https://github.com/juanfont/headscale/blob/main/hscontrol/handlers.go#L128

What do you think?

@kradalby commented on GitHub (Sep 26, 2024): I wonder if a `/health/oidc` endpoint might make more sense than a metric. My rational is: For a metric to be meaningful, we either need to poll oidc on every scrape, which seems a bit "wrong" and might interfere with other metrics. Alternatively we need a routine that checks all the time. While querying a health endpoint is more "please check if the oidc is available from your perspective", like what we do for the database here https://github.com/juanfont/headscale/blob/main/hscontrol/handlers.go#L128 What do you think?
Author
Owner

@viq commented on GitHub (Feb 1, 2025):

Sorry for the delay.

Current behaviour is that if the OIDC server is not working, at any point, for any reason, headscale decides that it cannot use OIDC, and requires a restart to change that. I would like to detect and expose that.

  1. I "don't care" what the status of OIDC server is. For that I'd be monitoring the OIDC server itself. There's little point for the scrape to cause headscale to reach out to it. What I do care about is what the internal state of headscale says about the state of OIDC server. Specifically, whether headscale requires a restart to remember that OIDC server exists and is to be used.
  2. I think metrics are easier to have a history of and to alert on. I may be wrong here, but that's my opinion about how to expose and consume this state.

Alternatively we need a routine that checks all the time.

That sounds like something to tie in with the internal state, telling headscale that it can start using the OIDC server again, thus mitigating the need for a restart. Which would also mitigate my need for a metric that would tell me that I need to restart headscale for the OIDC to be usable.

@viq commented on GitHub (Feb 1, 2025): Sorry for the delay. Current behaviour is that if the OIDC server is not working, at any point, for any reason, headscale decides that it cannot use OIDC, and requires a restart to change that. I would like to detect and expose that. 1. I "don't care" what the status of OIDC server is. For that I'd be monitoring the OIDC server itself. There's little point for the scrape to cause headscale to reach out to it. What I *do* care about is what the _internal state_ of headscale says about the state of OIDC server. Specifically, whether headscale requires a restart to remember that OIDC server exists and is to be used. 2. I think metrics are easier to have a history of and to alert on. I may be wrong here, but that's my opinion about how to expose and consume this state. > Alternatively we need a routine that checks all the time. That sounds like something to tie in with the internal state, telling headscale that it can start using the OIDC server again, thus mitigating the need for a restart. Which would also mitigate my need for a metric that would tell me that I need to restart headscale for the OIDC to be usable.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#709