Uptime
Tracks and aggregates route health status over time, providing uptime/downtime statistics and latency metrics.
Overview
The uptime package monitors route health status and calculates uptime percentages over configurable time periods. It integrates with the period package for historical storage and provides aggregated statistics for visualization.
Primary Consumers
internal/api/v1/metrics- HTTP endpoint for uptime datainternal/homepage- Dashboard uptime widgets- Monitoring and alerting systems
Non-goals
- Does not perform health checks (handled by
internal/route/routes) - Does not provide alerting on downtime
- Does not persist data beyond the period package retention
- Does not aggregate across multiple GoDoxy instances
Stability
Internal package. Data format and API are stable.
Public API
Exported Types
StatusByAlias
type StatusByAlias struct {
Map map[string]routes.HealthInfoWithoutDetail `json:"statuses"`
Timestamp int64 `json:"timestamp"`
}
Container for health status of all routes at a specific time.
Status
type Status struct {
Status types.HealthStatus `json:"status" swaggertype:"string" enums:"healthy,unhealthy,unknown,napping,starting"`
Latency int32 `json:"latency"`
Timestamp int64 `json:"timestamp"`
}
Individual route status at a point in time.
RouteAggregate
type RouteAggregate struct {
Alias string `json:"alias"`
DisplayName string `json:"display_name"`
Uptime float32 `json:"uptime"`
Downtime float32 `json:"downtime"`
Idle float32 `json:"idle"`
AvgLatency float32 `json:"avg_latency"`
IsDocker bool `json:"is_docker"`
IsExcluded bool `json:"is_excluded"`
CurrentStatus types.HealthStatus `json:"current_status" swaggertype:"string" enums:"healthy,unhealthy,unknown,napping,starting"`
Statuses []Status `json:"statuses"`
}
Aggregated statistics for a single route.
Aggregated
type Aggregated []RouteAggregate
Slice of route aggregates, sorted alphabetically by alias.
Exported Variables
Poller
var Poller = period.NewPoller("uptime", getStatuses, aggregateStatuses)
Pre-configured poller for uptime metrics. Start with Poller.Start().
Unexported Functions
getStatuses
func getStatuses(ctx context.Context, _ StatusByAlias) (StatusByAlias, error)
Collects current status of all routes. Called by the period poller every second.
Returns:
StatusByAlias- Map of all route statuses with current timestamperror- Always nil (errors are logged internally)
aggregateStatuses
func aggregateStatuses(entries []StatusByAlias, query url.Values) (int, Aggregated)
Aggregates status entries into route statistics.
Query Parameters:
period- Time filter (5m, 15m, 1h, 1d, 1mo)limit- Maximum number of routes to return (0 = all)offset- Offset for paginationkeyword- Fuzzy search keyword for filtering routes
Returns:
int- Total number of routes matching the queryAggregated- Slice of route aggregates
Architecture
Core Components
flowchart TD
subgraph Health Monitoring
Routes[Routes] -->|GetHealthInfoWithoutDetail| Status[Status Map]
Status -->|Polls every| Second[1 Second]
end
subgraph Poller
Poll[getStatuses] -->|Collects| StatusByAlias
StatusByAlias -->|Stores in| Period[Period StatusByAlias]
end
subgraph Aggregation
Query[Query Params] -->|Filters| Aggregate[aggregateStatuses]
Aggregate -->|Calculates| RouteAggregate
RouteAggregate -->|Uptime| UP[Uptime %]
RouteAggregate -->|Downtime| DOWN[Downtime %]
RouteAggregate -->|Idle| IDLE[Idle %]
RouteAggregate -->|Latency| LAT[Avg Latency]
end
subgraph Response
RouteAggregate -->|JSON| Client[API Client]
end
Data Flow
sequenceDiagram
participant Routes as Route Registry
participant Poller as Uptime Poller
participant Period as Period Storage
participant API as HTTP API
Routes->>Poller: GetHealthInfoWithoutDetail()
Poller->>Period: Add(StatusByAlias)
loop Every second
Poller->>Routes: Collect status
Poller->>Period: Store status
end
API->>Period: Get(filter)
Period-->>API: Entries
API->>API: aggregateStatuses()
API-->>Client: Aggregated JSON
Status Types
| Status | Description | Counted as Uptime? |
|---|---|---|
healthy |
Route is responding normally | Yes |
unhealthy |
Route is not responding | No |
unknown |
Status could not be determined | Excluded |
napping |
Route is in idle/sleep state | Idle (separate) |
starting |
Route is starting up | Idle (separate) |
Calculation Formula
For a set of status entries:
Uptime = healthy_count / total_count
Downtime = unhealthy_count / total_count
Idle = (napping_count + starting_count) / total_count
AvgLatency = sum(latency) / count
Note: unknown statuses are excluded from all calculations.
Configuration Surface
No explicit configuration. The poller uses period package defaults:
| Parameter | Value |
|---|---|
| Poll Interval | 1 second |
| Retention | 5m, 15m, 1h, 1d, 1mo periods |
Dependency and Integration Map
Internal Dependencies
| Package | Purpose |
|---|---|
internal/route/routes |
Health info retrieval |
internal/metrics/period |
Time-bucketed storage |
internal/types |
HealthStatus enum |
internal/metrics/utils |
Query utilities |
External Dependencies
| Dependency | Purpose |
|---|---|
github.com/lithammer/fuzzysearch/fuzzy |
Keyword matching |
github.com/bytedance/sonic |
JSON marshaling |
Integration Points
- Route health monitors provide status via
routes.GetHealthInfoWithoutDetail() - Period poller handles data collection and storage
- HTTP API provides query interface via
Poller.ServeHTTP
Observability
Logs
Poller lifecycle and errors are logged via zerolog.
Metrics
No metrics exposed directly. Status data available via API.
Failure Modes and Recovery
| Failure | Detection | Recovery |
|---|---|---|
| Route health monitor unavailable | Empty map returned | Log warning, continue |
| Invalid query parameters | aggregateStatuses returns empty |
Return empty result |
| Poller panic | Goroutine crash | Process terminates |
| Persistence failure | Load/save error | Log, continue with empty state |
Fuzzy Search
The package uses fuzzy.MatchFold for keyword matching:
- Case-insensitive matching
- Substring matching
- Fuzzy ranking
Usage Examples
Starting the Poller
import "github.com/yusing/godoxy/internal/metrics/uptime"
func init() {
uptime.Poller.Start()
}
HTTP Endpoint
import (
"github.com/gin-gonic/gin"
"github.com/yusing/godoxy/internal/metrics/uptime"
)
func setupUptimeAPI(r *gin.Engine) {
r.GET("/api/uptime", uptime.Poller.ServeHTTP)
}
API Examples:
# Get latest status
curl http://localhost:8080/api/uptime
# Get 1-hour history
curl "http://localhost:8080/api/uptime?period=1h"
# Get with limit and offset (pagination)
curl "http://localhost:8080/api/uptime?limit=10&offset=0"
# Search for routes containing "api"
curl "http://localhost:8080/api/uptime?keyword=api"
# Combined query
curl "http://localhost:8080/api/uptime?period=1d&limit=20&offset=0&keyword=docker"
WebSocket Streaming
const ws = new WebSocket(
"ws://localhost:8080/api/uptime?period=1m&interval=5s"
);
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
data.data.forEach((route) => {
console.log(`${route.display_name}: ${route.uptime * 100}% uptime`);
});
};
Direct Data Access
// Get entries for the last hour
entries, ok := uptime.Poller.Get(period.MetricsPeriod1h)
for _, entry := range entries {
for alias, status := range entry.Map {
fmt.Printf("Route %s: %s (latency: %dms)\n",
alias, status.Status, status.Latency.Milliseconds())
}
}
// Get aggregated statistics
_, agg := uptime.aggregateStatuses(entries, url.Values{
"period": []string{"1h"},
})
for _, route := range agg {
fmt.Printf("%s: %.1f%% uptime, %.1fms avg latency\n",
route.DisplayName, route.Uptime*100, route.AvgLatency)
}
Response Format
Latest Status Response:
{
"alias1": {
"status": "healthy",
"latency": 45
},
"alias2": {
"status": "unhealthy",
"latency": 0
}
}
Aggregated Response:
{
"total": 5,
"data": [
{
"alias": "api-server",
"display_name": "API Server",
"uptime": 0.98,
"downtime": 0.02,
"idle": 0.0,
"avg_latency": 45.5,
"is_docker": true,
"is_excluded": false,
"current_status": "healthy",
"statuses": [
{ "status": "healthy", "latency": 45, "timestamp": 1704892800 }
]
}
]
}
Performance Characteristics
- O(n) status collection per poll where n = number of routes
- O(m * k) aggregation where m = entries, k = routes
- Memory: O(p _ r _ s) where p = periods, r = routes, s = status size
- Fuzzy search is O(routes * keyword_length)
Testing Notes
- Mock
routes.GetHealthInfoWithoutDetail()for testing - Test aggregation with known status sequences
- Verify pagination and filtering logic
- Test fuzzy search matching
Related Packages
internal/route/routes- Route health monitoringinternal/metrics/period- Time-bucketed metrics storageinternal/types- Health status types