mirror of
https://github.com/yusing/godoxy.git
synced 2026-03-21 17:10:14 +01:00
Previously, up notifications were sent whenever a service recovered, even if no down notification had been sent (e.g., when recovering before the failure threshold was met). This could confuse users who would receive "service is up" notifications without ever being notified of a problem. Now, recovery notifications are only sent when a prior down notification exists, ensuring notification pairs are always complete.
internal/health/monitor
Route health monitoring with configurable check intervals, retry policies, and notification integration.
Overview
Purpose
This package provides health monitoring for different route types in GoDoxy:
- Monitors service health via configurable check functions
- Tracks consecutive failures with configurable thresholds
- Sends notifications on status changes
- Provides last-seen tracking for idle detection
Primary Consumers
internal/route/- Route health monitoringinternal/api/v1/metrics/- Uptime poller integration- WebUI - Health status display
Non-goals
- Health check execution itself (delegated to
internal/health/check/) - Alert routing (handled by
internal/notif/) - Automatic remediation
Stability
Internal package with stable public interfaces. HealthMonitor interface is stable.
Public API
Types
type HealthCheckFunc func(url *url.URL) (result types.HealthCheckResult, err error)
HealthMonitor Interface
type HealthMonitor interface {
Start(parent task.Parent) error
Task() *task.Task
Finish(reason any)
UpdateURL(url *url.URL)
URL() *url.URL
Config() *types.HealthCheckConfig
Status() types.HealthStatus
Uptime() time.Duration
Latency() time.Duration
Detail() string
Name() string
String() string
CheckHealth() (types.HealthCheckResult, error)
}
Monitor Creation (new.go)
// Create monitor for agent-proxied routes
func NewAgentProxiedMonitor(
ctx context.Context,
cfg types.HealthCheckConfig,
url *url.URL,
) (HealthMonitor, error)
// Create monitor for Docker containers
func NewDockerHealthMonitor(
ctx context.Context,
cfg types.HealthCheckConfig,
url *url.URL,
containerID string,
) (HealthMonitor, error)
// Create monitor for HTTP routes
func NewHTTPMonitor(
ctx context.Context,
cfg types.HealthCheckConfig,
url *url.URL,
) HealthMonitor
// Create monitor for H2C (HTTP/2 cleartext) routes
func NewH2CMonitor(
ctx context.Context,
cfg types.HealthCheckConfig,
url *url.URL,
) HealthMonitor
// Create monitor for file server routes
func NewFileServerMonitor(
cfg types.HealthCheckConfig,
url *url.URL,
) HealthMonitor
// Create monitor for stream routes
func NewStreamMonitor(
cfg types.HealthCheckConfig,
url *url.URL,
) HealthMonitor
// Unified monitor factory (routes to appropriate type)
func NewMonitor(
ctx context.Context,
cfg types.HealthCheckConfig,
url *url.URL,
) (HealthMonitor, error)
Architecture
Monitor Selection Flow
flowchart TD
A[NewMonitor route] --> B{IsAgent route?}
B -->|true| C[NewAgentProxiedMonitor]
B -->|false| D{IsDocker route?}
D -->|true| E[NewDockerHealthMonitor]
D -->|false| F{Has h2c scheme?}
F -->|true| G[NewH2CMonitor]
F -->|false| H{Has http/https scheme?}
H -->|true| I[NewHTTPMonitor]
H -->|false| J{Is file:// scheme?}
J -->|true| K[NewFileServerMonitor]
J -->|false| L[NewStreamMonitor]
Monitor State Machine
stateDiagram-v2
[*] --> Starting: First check
Starting --> Healthy: Check passes
Starting --> Unhealthy: Check fails
Healthy --> Unhealthy: 5 consecutive failures
Healthy --> Error: Check error
Error --> Healthy: Check passes
Error --> Unhealthy: 5 consecutive failures
Unhealthy --> Healthy: Check passes
Unhealthy --> Error: Check error
[*] --> Stopped: Task cancelled
Component Structure
classDiagram
class monitor {
-service string
-config types.HealthCheckConfig
-url synk.Value~*url.URL~
-status synk.Value~HealthStatus~
-lastResult synk.Value~HealthCheckResult~
-checkHealth HealthCheckFunc
-startTime time.Time
-task *task.Task
+Start(parent task.Parent)
+CheckHealth() (HealthCheckResult, error)
+Status() HealthStatus
+Uptime() time.Duration
+Latency() time.Duration
+Detail() string
}
class HealthMonitor {
<<interface>>
+Start(parent task.Parent)
+Task() *task.Task
+Status() HealthStatus
}
Configuration Surface
HealthCheckConfig
type HealthCheckConfig struct {
Interval time.Duration // Check interval (default: 30s)
Timeout time.Duration // Check timeout (default: 10s)
Path string // Health check path
Method string // HTTP method (GET/HEAD)
Retries int // Consecutive failures before notification (-1 for immediate)
BaseContext func() context.Context
}
Defaults
| Field | Default |
|---|---|
| Interval | 30s |
| Timeout | 10s |
| Method | GET |
| Path | "/" |
| Retries | 3 |
Applying Defaults
cfg.ApplyDefaults(state.Value().Defaults.HealthCheck)
Dependency and Integration Map
Internal Dependencies
internal/task/task.go- Lifetime managementinternal/notif/- Status change notificationsinternal/health/check/- Health check implementationsinternal/types/- Health status typesinternal/config/types/- Working state
External Dependencies
github.com/puzpuzpuz/xsync/v4- Atomic values
Observability
Logs
| Level | When |
|---|---|
Info |
Service comes up |
Warn |
Service goes down |
Error |
Health check error |
Error |
Monitor stopped after 5 trials |
Notifications
- Service up notification (with latency)
- Service down notification (with last seen time)
- Immediate notification when
Retries < 0
Metrics
- Consecutive failure count
- Last check latency
- Monitor uptime
Failure Modes and Recovery
| Failure Mode | Impact | Recovery |
|---|---|---|
| 5 consecutive check errors | Monitor enters Error state, task stops | Manual restart required |
| Health check function panic | Monitor crashes | Automatic cleanup |
| Context cancellation | Monitor stops gracefully | Stopped state |
| URL update to invalid | Check will fail | Manual URL fix |
Status Transitions
| From | To | Condition |
|---|---|---|
| Starting | Healthy | Check passes |
| Starting | Unhealthy | Check fails |
| Healthy | Unhealthy | Retries consecutive failures |
| Healthy | Error | Check returns error |
| Unhealthy | Healthy | Check passes |
| Error | Healthy | Check passes |
Usage Examples
Creating an HTTP Monitor
cfg := types.HealthCheckConfig{
Interval: 15 * time.Second,
Timeout: 5 * time.Second,
Path: "/health",
Retries: 3,
}
url, _ := url.Parse("http://localhost:8080")
monitor := monitor.NewHTTPMonitor(context.Background(), cfg, url)
if err := monitor.Start(parent); err != nil {
return err
}
// Check status
fmt.Printf("Status: %s\n", monitor.Status())
fmt.Printf("Latency: %v\n", monitor.Latency())
Creating a Docker Monitor
monitor, err := monitor.NewDockerHealthMonitor(
context.Background(),
cfg,
url,
containerID,
)
if err != nil {
return err
}
monitor.Start(parent)
Unified Factory
monitor, err := monitor.NewMonitor(ctx, cfg, url)
if err != nil {
return err
}
monitor.Start(parent)
Testing Notes
monitor_test.go- Monitor lifecycle tests- Mock health check functions for deterministic testing
- Status transition coverage tests
- Notification trigger tests