mirror of
https://github.com/yusing/godoxy.git
synced 2026-03-30 13:51:52 +02:00
- Removed `display_name`, `is_docker`, and `is_excluded` fields from the `RouteAggregate` struct and corresponding Swagger documentation. - Updated references in the README and code to reflect the removal of these fields, ensuring consistency across the codebase.
397 lines
10 KiB
Markdown
397 lines
10 KiB
Markdown
# Uptime
|
|
|
|
Tracks and aggregates route health status over time, providing uptime/downtime statistics and latency metrics.
|
|
|
|
## Overview
|
|
|
|
The uptime package monitors route health status and calculates uptime percentages over configurable time periods. It integrates with the `period` package for historical storage and provides aggregated statistics for visualization.
|
|
|
|
### Primary Consumers
|
|
|
|
- `internal/api/v1/metrics` - HTTP endpoint for uptime data
|
|
- `internal/homepage` - Dashboard uptime widgets
|
|
- Monitoring and alerting systems
|
|
|
|
### Non-goals
|
|
|
|
- Does not perform health checks (handled by `internal/route/routes`)
|
|
- Does not provide alerting on downtime
|
|
- Does not persist data beyond the period package retention
|
|
- Does not aggregate across multiple GoDoxy instances
|
|
|
|
### Stability
|
|
|
|
Internal package. Data format and API are stable.
|
|
|
|
## Public API
|
|
|
|
### Exported Types
|
|
|
|
#### StatusByAlias
|
|
|
|
```go
|
|
type StatusByAlias struct {
|
|
Map map[string]routes.HealthInfoWithoutDetail `json:"statuses"`
|
|
Timestamp int64 `json:"timestamp"`
|
|
}
|
|
```
|
|
|
|
Container for health status of all routes at a specific time.
|
|
|
|
#### Status
|
|
|
|
```go
|
|
type Status struct {
|
|
Status types.HealthStatus `json:"status" swaggertype:"string" enums:"healthy,unhealthy,unknown,napping,starting"`
|
|
Latency int32 `json:"latency"`
|
|
Timestamp int64 `json:"timestamp"`
|
|
}
|
|
```
|
|
|
|
Individual route status at a point in time.
|
|
|
|
#### RouteAggregate
|
|
|
|
```go
|
|
type RouteAggregate struct {
|
|
Alias string `json:"alias"`
|
|
Uptime float32 `json:"uptime"`
|
|
Downtime float32 `json:"downtime"`
|
|
Idle float32 `json:"idle"`
|
|
AvgLatency float32 `json:"avg_latency"`
|
|
CurrentStatus types.HealthStatus `json:"current_status" swaggertype:"string" enums:"healthy,unhealthy,unknown,napping,starting"`
|
|
Statuses []Status `json:"statuses"`
|
|
}
|
|
```
|
|
|
|
Aggregated statistics for a single route.
|
|
|
|
#### Aggregated
|
|
|
|
```go
|
|
type Aggregated []RouteAggregate
|
|
```
|
|
|
|
Slice of route aggregates, sorted alphabetically by alias.
|
|
|
|
### Exported Variables
|
|
|
|
#### Poller
|
|
|
|
```go
|
|
var Poller = period.NewPoller("uptime", getStatuses, aggregateStatuses)
|
|
```
|
|
|
|
Pre-configured poller for uptime metrics. Start with `Poller.Start()`.
|
|
|
|
### Unexported Functions
|
|
|
|
#### getStatuses
|
|
|
|
```go
|
|
func getStatuses(ctx context.Context, _ StatusByAlias) (StatusByAlias, error)
|
|
```
|
|
|
|
Collects current status of all routes. Called by the period poller every second.
|
|
|
|
**Returns:**
|
|
|
|
- `StatusByAlias` - Map of all route statuses with current timestamp
|
|
- `error` - Always nil (errors are logged internally)
|
|
|
|
#### aggregateStatuses
|
|
|
|
```go
|
|
func aggregateStatuses(entries []StatusByAlias, query url.Values) (int, Aggregated)
|
|
```
|
|
|
|
Aggregates status entries into route statistics.
|
|
|
|
**Query Parameters:**
|
|
|
|
- `period` - Time filter (5m, 15m, 1h, 1d, 1mo)
|
|
- `limit` - Maximum number of routes to return (0 = all)
|
|
- `offset` - Offset for pagination
|
|
- `keyword` - Fuzzy search keyword for filtering routes
|
|
|
|
**Returns:**
|
|
|
|
- `int` - Total number of routes matching the query
|
|
- `Aggregated` - Slice of route aggregates
|
|
|
|
## Architecture
|
|
|
|
### Core Components
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
subgraph Health Monitoring
|
|
Routes[Routes] -->|GetHealthInfoWithoutDetail| Status[Status Map]
|
|
Status -->|Polls every| Second[1 Second]
|
|
end
|
|
|
|
subgraph Poller
|
|
Poll[getStatuses] -->|Collects| StatusByAlias
|
|
StatusByAlias -->|Stores in| Period[Period StatusByAlias]
|
|
end
|
|
|
|
subgraph Aggregation
|
|
Query[Query Params] -->|Filters| Aggregate[aggregateStatuses]
|
|
Aggregate -->|Calculates| RouteAggregate
|
|
RouteAggregate -->|Uptime| UP[Uptime %]
|
|
RouteAggregate -->|Downtime| DOWN[Downtime %]
|
|
RouteAggregate -->|Idle| IDLE[Idle %]
|
|
RouteAggregate -->|Latency| LAT[Avg Latency]
|
|
end
|
|
|
|
subgraph Response
|
|
RouteAggregate -->|JSON| Client[API Client]
|
|
end
|
|
```
|
|
|
|
### Data Flow
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Routes as Route Registry
|
|
participant Poller as Uptime Poller
|
|
participant Period as Period Storage
|
|
participant API as HTTP API
|
|
|
|
Routes->>Poller: GetHealthInfoWithoutDetail()
|
|
Poller->>Period: Add(StatusByAlias)
|
|
|
|
loop Every second
|
|
Poller->>Routes: Collect status
|
|
Poller->>Period: Store status
|
|
end
|
|
|
|
API->>Period: Get(filter)
|
|
Period-->>API: Entries
|
|
API->>API: aggregateStatuses()
|
|
API-->>Client: Aggregated JSON
|
|
```
|
|
|
|
### Status Types
|
|
|
|
| Status | Description | Counted as Uptime? |
|
|
| ----------- | ------------------------------ | ------------------ |
|
|
| `healthy` | Route is responding normally | Yes |
|
|
| `unhealthy` | Route is not responding | No |
|
|
| `unknown` | Status could not be determined | Excluded |
|
|
| `napping` | Route is in idle/sleep state | Idle (separate) |
|
|
| `starting` | Route is starting up | Idle (separate) |
|
|
|
|
### Calculation Formula
|
|
|
|
For a set of status entries:
|
|
|
|
```
|
|
Uptime = healthy_count / total_count
|
|
Downtime = unhealthy_count / total_count
|
|
Idle = (napping_count + starting_count) / total_count
|
|
AvgLatency = sum(latency) / count
|
|
```
|
|
|
|
Note: `unknown` statuses are excluded from all calculations.
|
|
|
|
## Configuration Surface
|
|
|
|
No explicit configuration. The poller uses period package defaults:
|
|
|
|
| Parameter | Value |
|
|
| ------------- | ---------------------------- |
|
|
| Poll Interval | 1 second |
|
|
| Retention | 5m, 15m, 1h, 1d, 1mo periods |
|
|
|
|
## Dependency and Integration Map
|
|
|
|
### Internal Dependencies
|
|
|
|
| Package | Purpose |
|
|
| ------------------------- | --------------------- |
|
|
| `internal/route/routes` | Health info retrieval |
|
|
| `internal/metrics/period` | Time-bucketed storage |
|
|
| `internal/types` | HealthStatus enum |
|
|
| `internal/metrics/utils` | Query utilities |
|
|
|
|
### External Dependencies
|
|
|
|
| Dependency | Purpose |
|
|
| ---------------------------------------- | ---------------- |
|
|
| `github.com/lithammer/fuzzysearch/fuzzy` | Keyword matching |
|
|
| `github.com/bytedance/sonic` | JSON marshaling |
|
|
|
|
### Integration Points
|
|
|
|
- Route health monitors provide status via `routes.GetHealthInfoWithoutDetail()`
|
|
- Period poller handles data collection and storage
|
|
- HTTP API provides query interface via `Poller.ServeHTTP`
|
|
|
|
## Observability
|
|
|
|
### Logs
|
|
|
|
Poller lifecycle and errors are logged via zerolog.
|
|
|
|
### Metrics
|
|
|
|
No metrics exposed directly. Status data available via API.
|
|
|
|
## Failure Modes and Recovery
|
|
|
|
| Failure | Detection | Recovery |
|
|
| -------------------------------- | --------------------------------- | ------------------------------ |
|
|
| Route health monitor unavailable | Empty map returned | Log warning, continue |
|
|
| Invalid query parameters | `aggregateStatuses` returns empty | Return empty result |
|
|
| Poller panic | Goroutine crash | Process terminates |
|
|
| Persistence failure | Load/save error | Log, continue with empty state |
|
|
|
|
### Fuzzy Search
|
|
|
|
The package uses `fuzzy.MatchFold` for keyword matching:
|
|
|
|
- Case-insensitive matching
|
|
- Substring matching
|
|
- Fuzzy ranking
|
|
|
|
## Usage Examples
|
|
|
|
### Starting the Poller
|
|
|
|
```go
|
|
import "github.com/yusing/godoxy/internal/metrics/uptime"
|
|
|
|
func init() {
|
|
uptime.Poller.Start()
|
|
}
|
|
```
|
|
|
|
### HTTP Endpoint
|
|
|
|
```go
|
|
import (
|
|
"github.com/gin-gonic/gin"
|
|
"github.com/yusing/godoxy/internal/metrics/uptime"
|
|
)
|
|
|
|
func setupUptimeAPI(r *gin.Engine) {
|
|
r.GET("/api/uptime", uptime.Poller.ServeHTTP)
|
|
}
|
|
```
|
|
|
|
**API Examples:**
|
|
|
|
```bash
|
|
# Get latest status
|
|
curl http://localhost:8080/api/uptime
|
|
|
|
# Get 1-hour history
|
|
curl "http://localhost:8080/api/uptime?period=1h"
|
|
|
|
# Get with limit and offset (pagination)
|
|
curl "http://localhost:8080/api/uptime?limit=10&offset=0"
|
|
|
|
# Search for routes containing "api"
|
|
curl "http://localhost:8080/api/uptime?keyword=api"
|
|
|
|
# Combined query
|
|
curl "http://localhost:8080/api/uptime?period=1d&limit=20&offset=0&keyword=docker"
|
|
```
|
|
|
|
### WebSocket Streaming
|
|
|
|
```javascript
|
|
const ws = new WebSocket(
|
|
"ws://localhost:8080/api/uptime?period=1m&interval=5s"
|
|
);
|
|
|
|
ws.onmessage = (event) => {
|
|
const data = JSON.parse(event.data);
|
|
data.data.forEach((route) => {
|
|
console.log(`${route.alias}: ${route.uptime * 100}% uptime`);
|
|
});
|
|
};
|
|
```
|
|
|
|
### Direct Data Access
|
|
|
|
```go
|
|
// Get entries for the last hour
|
|
entries, ok := uptime.Poller.Get(period.MetricsPeriod1h)
|
|
for _, entry := range entries {
|
|
for alias, status := range entry.Map {
|
|
fmt.Printf("Route %s: %s (latency: %dms)\n",
|
|
alias, status.Status, status.Latency.Milliseconds())
|
|
}
|
|
}
|
|
|
|
// Get aggregated statistics
|
|
_, agg := uptime.aggregateStatuses(entries, url.Values{
|
|
"period": []string{"1h"},
|
|
})
|
|
|
|
for _, route := range agg {
|
|
fmt.Printf("%s: %.1f%% uptime, %.1fms avg latency\n",
|
|
route.Alias, route.Uptime*100, route.AvgLatency)
|
|
}
|
|
```
|
|
|
|
### Response Format
|
|
|
|
**Latest Status Response:**
|
|
|
|
```json
|
|
{
|
|
"alias1": {
|
|
"status": "healthy",
|
|
"latency": 45
|
|
},
|
|
"alias2": {
|
|
"status": "unhealthy",
|
|
"latency": 0
|
|
}
|
|
}
|
|
```
|
|
|
|
**Aggregated Response:**
|
|
|
|
```json
|
|
{
|
|
"total": 5,
|
|
"data": [
|
|
{
|
|
"alias": "api-server",
|
|
"uptime": 0.98,
|
|
"downtime": 0.02,
|
|
"idle": 0.0,
|
|
"avg_latency": 45.5,
|
|
"current_status": "healthy",
|
|
"statuses": [
|
|
{ "status": "healthy", "latency": 45, "timestamp": 1704892800 }
|
|
]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## Performance Characteristics
|
|
|
|
- O(n) status collection per poll where n = number of routes
|
|
- O(m \* k) aggregation where m = entries, k = routes
|
|
- Memory: O(p _ r _ s) where p = periods, r = routes, s = status size
|
|
- Fuzzy search is O(routes \* keyword_length)
|
|
|
|
## Testing Notes
|
|
|
|
- Mock `routes.GetHealthInfoWithoutDetail()` for testing
|
|
- Test aggregation with known status sequences
|
|
- Verify pagination and filtering logic
|
|
- Test fuzzy search matching
|
|
|
|
## Related Packages
|
|
|
|
- `internal/route/routes` - Route health monitoring
|
|
- `internal/metrics/period` - Time-bucketed metrics storage
|
|
- `internal/types` - Health status types
|