Files

Uptime

Tracks and aggregates route health status over time, providing uptime/downtime statistics and latency metrics.

Overview

The uptime package monitors route health status and calculates uptime percentages over configurable time periods. It integrates with the period package for historical storage and provides aggregated statistics for visualization.

Primary Consumers

  • internal/api/v1/metrics - HTTP endpoint for uptime data
  • internal/homepage - Dashboard uptime widgets
  • Monitoring and alerting systems

Non-goals

  • Does not perform health checks (handled by internal/route/routes)
  • Does not provide alerting on downtime
  • Does not persist data beyond the period package retention
  • Does not aggregate across multiple GoDoxy instances

Stability

Internal package. Data format and API are stable.

Public API

Exported Types

StatusByAlias

type StatusByAlias struct {
    Map       map[string]routes.HealthInfoWithoutDetail `json:"statuses"`
    Timestamp int64                                     `json:"timestamp"`
}

Container for health status of all routes at a specific time.

Status

type Status struct {
    Status    types.HealthStatus `json:"status" swaggertype:"string" enums:"healthy,unhealthy,unknown,napping,starting"`
    Latency   int32              `json:"latency"`
    Timestamp int64              `json:"timestamp"`
}

Individual route status at a point in time.

RouteAggregate

type RouteAggregate struct {
    Alias         string             `json:"alias"`
    DisplayName   string             `json:"display_name"`
    Uptime        float32            `json:"uptime"`
    Downtime      float32            `json:"downtime"`
    Idle          float32            `json:"idle"`
    AvgLatency    float32            `json:"avg_latency"`
    IsDocker      bool               `json:"is_docker"`
    IsExcluded    bool               `json:"is_excluded"`
    CurrentStatus types.HealthStatus `json:"current_status" swaggertype:"string" enums:"healthy,unhealthy,unknown,napping,starting"`
    Statuses      []Status           `json:"statuses"`
}

Aggregated statistics for a single route.

Aggregated

type Aggregated []RouteAggregate

Slice of route aggregates, sorted alphabetically by alias.

Exported Variables

Poller

var Poller = period.NewPoller("uptime", getStatuses, aggregateStatuses)

Pre-configured poller for uptime metrics. Start with Poller.Start().

Unexported Functions

getStatuses

func getStatuses(ctx context.Context, _ StatusByAlias) (StatusByAlias, error)

Collects current status of all routes. Called by the period poller every second.

Returns:

  • StatusByAlias - Map of all route statuses with current timestamp
  • error - Always nil (errors are logged internally)

aggregateStatuses

func aggregateStatuses(entries []StatusByAlias, query url.Values) (int, Aggregated)

Aggregates status entries into route statistics.

Query Parameters:

  • period - Time filter (5m, 15m, 1h, 1d, 1mo)
  • limit - Maximum number of routes to return (0 = all)
  • offset - Offset for pagination
  • keyword - Fuzzy search keyword for filtering routes

Returns:

  • int - Total number of routes matching the query
  • Aggregated - Slice of route aggregates

Architecture

Core Components

flowchart TD
    subgraph Health Monitoring
        Routes[Routes] -->|GetHealthInfoWithoutDetail| Status[Status Map]
        Status -->|Polls every| Second[1 Second]
    end

    subgraph Poller
        Poll[getStatuses] -->|Collects| StatusByAlias
        StatusByAlias -->|Stores in| Period[Period StatusByAlias]
    end

    subgraph Aggregation
        Query[Query Params] -->|Filters| Aggregate[aggregateStatuses]
        Aggregate -->|Calculates| RouteAggregate
        RouteAggregate -->|Uptime| UP[Uptime %]
        RouteAggregate -->|Downtime| DOWN[Downtime %]
        RouteAggregate -->|Idle| IDLE[Idle %]
        RouteAggregate -->|Latency| LAT[Avg Latency]
    end

    subgraph Response
        RouteAggregate -->|JSON| Client[API Client]
    end

Data Flow

sequenceDiagram
    participant Routes as Route Registry
    participant Poller as Uptime Poller
    participant Period as Period Storage
    participant API as HTTP API

    Routes->>Poller: GetHealthInfoWithoutDetail()
    Poller->>Period: Add(StatusByAlias)

    loop Every second
        Poller->>Routes: Collect status
        Poller->>Period: Store status
    end

    API->>Period: Get(filter)
    Period-->>API: Entries
    API->>API: aggregateStatuses()
    API-->>Client: Aggregated JSON

Status Types

Status Description Counted as Uptime?
healthy Route is responding normally Yes
unhealthy Route is not responding No
unknown Status could not be determined Excluded
napping Route is in idle/sleep state Idle (separate)
starting Route is starting up Idle (separate)

Calculation Formula

For a set of status entries:

Uptime = healthy_count / total_count
Downtime = unhealthy_count / total_count
Idle = (napping_count + starting_count) / total_count
AvgLatency = sum(latency) / count

Note: unknown statuses are excluded from all calculations.

Configuration Surface

No explicit configuration. The poller uses period package defaults:

Parameter Value
Poll Interval 1 second
Retention 5m, 15m, 1h, 1d, 1mo periods

Dependency and Integration Map

Internal Dependencies

Package Purpose
internal/route/routes Health info retrieval
internal/metrics/period Time-bucketed storage
internal/types HealthStatus enum
internal/metrics/utils Query utilities

External Dependencies

Dependency Purpose
github.com/lithammer/fuzzysearch/fuzzy Keyword matching
github.com/bytedance/sonic JSON marshaling

Integration Points

  • Route health monitors provide status via routes.GetHealthInfoWithoutDetail()
  • Period poller handles data collection and storage
  • HTTP API provides query interface via Poller.ServeHTTP

Observability

Logs

Poller lifecycle and errors are logged via zerolog.

Metrics

No metrics exposed directly. Status data available via API.

Failure Modes and Recovery

Failure Detection Recovery
Route health monitor unavailable Empty map returned Log warning, continue
Invalid query parameters aggregateStatuses returns empty Return empty result
Poller panic Goroutine crash Process terminates
Persistence failure Load/save error Log, continue with empty state

The package uses fuzzy.MatchFold for keyword matching:

  • Case-insensitive matching
  • Substring matching
  • Fuzzy ranking

Usage Examples

Starting the Poller

import "github.com/yusing/godoxy/internal/metrics/uptime"

func init() {
    uptime.Poller.Start()
}

HTTP Endpoint

import (
    "github.com/gin-gonic/gin"
    "github.com/yusing/godoxy/internal/metrics/uptime"
)

func setupUptimeAPI(r *gin.Engine) {
    r.GET("/api/uptime", uptime.Poller.ServeHTTP)
}

API Examples:

# Get latest status
curl http://localhost:8080/api/uptime

# Get 1-hour history
curl "http://localhost:8080/api/uptime?period=1h"

# Get with limit and offset (pagination)
curl "http://localhost:8080/api/uptime?limit=10&offset=0"

# Search for routes containing "api"
curl "http://localhost:8080/api/uptime?keyword=api"

# Combined query
curl "http://localhost:8080/api/uptime?period=1d&limit=20&offset=0&keyword=docker"

WebSocket Streaming

const ws = new WebSocket(
  "ws://localhost:8080/api/uptime?period=1m&interval=5s"
);

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  data.data.forEach((route) => {
    console.log(`${route.display_name}: ${route.uptime * 100}% uptime`);
  });
};

Direct Data Access

// Get entries for the last hour
entries, ok := uptime.Poller.Get(period.MetricsPeriod1h)
for _, entry := range entries {
    for alias, status := range entry.Map {
        fmt.Printf("Route %s: %s (latency: %dms)\n",
            alias, status.Status, status.Latency.Milliseconds())
    }
}

// Get aggregated statistics
_, agg := uptime.aggregateStatuses(entries, url.Values{
    "period": []string{"1h"},
})

for _, route := range agg {
    fmt.Printf("%s: %.1f%% uptime, %.1fms avg latency\n",
        route.DisplayName, route.Uptime*100, route.AvgLatency)
}

Response Format

Latest Status Response:

{
  "alias1": {
    "status": "healthy",
    "latency": 45
  },
  "alias2": {
    "status": "unhealthy",
    "latency": 0
  }
}

Aggregated Response:

{
  "total": 5,
  "data": [
    {
      "alias": "api-server",
      "display_name": "API Server",
      "uptime": 0.98,
      "downtime": 0.02,
      "idle": 0.0,
      "avg_latency": 45.5,
      "is_docker": true,
      "is_excluded": false,
      "current_status": "healthy",
      "statuses": [
        { "status": "healthy", "latency": 45, "timestamp": 1704892800 }
      ]
    }
  ]
}

Performance Characteristics

  • O(n) status collection per poll where n = number of routes
  • O(m * k) aggregation where m = entries, k = routes
  • Memory: O(p _ r _ s) where p = periods, r = routes, s = status size
  • Fuzzy search is O(routes * keyword_length)

Testing Notes

  • Mock routes.GetHealthInfoWithoutDetail() for testing
  • Test aggregation with known status sequences
  • Verify pagination and filtering logic
  • Test fuzzy search matching
  • internal/route/routes - Route health monitoring
  • internal/metrics/period - Time-bucketed metrics storage
  • internal/types - Health status types