docs: add per package README for implementation details (AI generated with human review)

This commit is contained in:
yusing
2026-01-08 23:39:19 +08:00
parent 2f2828ec48
commit 86b655be3c
54 changed files with 13825 additions and 124 deletions

118
internal/metrics/README.md Normal file
View File

@@ -0,0 +1,118 @@
# Metrics Package
System monitoring and metrics collection for GoDoxy with time-series storage and REST/WebSocket APIs.
## Overview
This package provides a unified metrics collection system that:
- Polls system and route data at regular intervals
- Stores historical data across multiple time periods
- Exposes both REST and WebSocket APIs for consumption
### Primary Consumers
- `internal/api/v1/metrics/` - REST API endpoints
- WebUI - Real-time charts
- `internal/metrics/uptime/` - Route health monitoring
### Non-goals
- Metric aggregation from external sources
- Alerting (handled by `internal/notif/`)
- Long-term storage (30-day retention only)
### Stability
Internal package. See `internal/metrics/period/README.md` for the core framework documentation.
## Packages
### `period/`
Generic time-bucketed metrics storage framework with:
- `Period[T]` - Multi-timeframe container
- `Poller[T, A]` - Background data collector
- `Entries[T]` - Circular buffer for time-series data
See [period/README.md](./period/README.md) for full documentation.
### `uptime/`
Route health status monitoring using the period framework.
### `systeminfo/`
System metrics collection (CPU, memory, disk, network, sensors) using the period framework.
## Architecture
```mermaid
graph TB
subgraph "Data Sources"
SI[SystemInfo Poller]
UP[Uptime Poller]
end
subgraph "Period Framework"
P[Period<T> Generic]
E[Entries<T> Ring Buffer]
PL[Poller<T, A> Orchestrator]
H[Handler HTTP API]
end
subgraph "Storage"
JSON[(data/metrics/*.json)]
end
P --> E
PL --> P
PL --> SI
PL --> UP
H --> PL
PL --> JSON
```
## Configuration Surface
No explicit configuration. Pollers respect `common.MetricsDisable*` flags:
| Flag | Disables |
| ----------------------- | ------------------------- |
| `MetricsDisableCPU` | CPU percentage collection |
| `MetricsDisableMemory` | Memory statistics |
| `MetricsDisableDisk` | Disk usage and I/O |
| `MetricsDisableNetwork` | Network counters |
| `MetricsDisableSensors` | Temperature sensors |
## Dependency and Integration Map
### Internal Dependencies
- `github.com/yusing/goutils/task` - Lifetime management
- `internal/types` - Health check types
### External Dependencies
- `github.com/shirou/gopsutil/v4` - System metrics collection
- `github.com/puzpuzpuz/xsync/v4` - Atomic value storage
- `github.com/bytedance/sonic` - JSON serialization
## Observability
### Logs
| Level | When |
| ------- | ------------------------------------------- |
| `Debug` | Poller start, data load/save |
| `Error` | Data source failures (aggregated every 30s) |
## Failure Modes and Recovery
| Failure Mode | Impact | Recovery |
| ------------------------- | -------------------- | -------------------------------- |
| Data source timeout | Missing data point | Logged, aggregated, continues |
| Disk read failure | No historical data | Starts fresh, warns |
| Disk write failure | Data loss on restart | Continues, retries next interval |
| Memory allocation failure | OOM risk | Go runtime handles |

View File

@@ -0,0 +1,470 @@
# Period Metrics
Provides time-bucketed metrics storage with configurable periods, enabling historical data aggregation and real-time streaming.
## Overview
The period package implements a generic metrics collection system with time-bucketed storage. It collects data points at regular intervals and stores them in predefined time windows (5m, 15m, 1h, 1d, 1mo) with automatic persistence and HTTP/WebSocket APIs.
### Primary Consumers
- `internal/metrics/uptime` - Route health status storage
- `internal/metrics/systeminfo` - System metrics storage
- `internal/api/v1/metrics` - HTTP API endpoints
### Non-goals
- Does not provide data visualization
- Does not implement alerting or anomaly detection
- Does not support custom time periods (fixed set only)
- Does not provide data aggregation across multiple instances
### Stability
Internal package. Public interfaces are stable.
## Public API
### Exported Types
#### Period[T] Struct
```go
type Period[T any] struct {
Entries map[Filter]*Entries[T]
mu sync.RWMutex
}
```
Container for all time-bucketed entries. Maps each filter to its corresponding `Entries`.
**Methods:**
- `Add(info T)` - Adds a data point to all periods
- `Get(filter Filter) ([]T, bool)` - Gets entries for a specific period
- `Total() int` - Returns total number of entries across all periods
- `ValidateAndFixIntervals()` - Validates and fixes intervals after loading
#### Entries[T] Struct
```go
type Entries[T any] struct {
entries [maxEntries]T
index int
count int
interval time.Duration
lastAdd time.Time
}
```
Circular buffer holding up to 100 entries for a single time period.
**Methods:**
- `Add(now time.Time, info T)` - Adds an entry with interval checking
- `Get() []T` - Returns all entries in chronological order
#### Filter Type
```go
type Filter string
```
Time period filter.
```go
const (
MetricsPeriod5m Filter = "5m"
MetricsPeriod15m Filter = "15m"
MetricsPeriod1h Filter = "1h"
MetricsPeriod1d Filter = "1d"
MetricsPeriod1mo Filter = "1mo"
)
```
#### Poller[T, A] Struct
```go
type Poller[T any, A any] struct {
name string
poll PollFunc[T]
aggregate AggregateFunc[T, A]
resultFilter FilterFunc[T]
period *Period[T]
lastResult synk.Value[T]
errs []pollErr
}
```
Generic poller that collects data at regular intervals.
**Type Aliases:**
```go
type PollFunc[T any] func(ctx context.Context, lastResult T) (T, error)
type AggregateFunc[T any, A any] func(entries []T, query url.Values) (total int, result A)
type FilterFunc[T any] func(entries []T, keyword string) (filtered []T)
```
#### ResponseType[AggregateT]
```go
type ResponseType[AggregateT any] struct {
Total int `json:"total"`
Data AggregateT `json:"data"`
}
```
Standard response format for API endpoints.
### Exported Functions
#### Period Constructors
```go
func NewPeriod[T any]() *Period[T]
```
Creates a new `Period[T]` with all time buckets initialized.
#### Poller Constructors
```go
func NewPoller[T any, A any](
name string,
poll PollFunc[T],
aggregator AggregateFunc[T, A],
) *Poller[T, A]
```
Creates a new poller with the specified name, poll function, and aggregator.
```go
func (p *Poller[T, A]) WithResultFilter(filter FilterFunc[T]) *Poller[T, A]
```
Adds a result filter to the poller for keyword-based filtering.
#### Poller Methods
```go
func (p *Poller[T, A]) Get(filter Filter) ([]T, bool)
```
Gets entries for a specific time period.
```go
func (p *Poller[T, A]) GetLastResult() T
```
Gets the most recently collected data point.
```go
func (p *Poller[T, A]) Start()
```
Starts the poller. Launches a background goroutine that:
1. Polls for data at 1-second intervals
1. Stores data in all time buckets
1. Saves data to disk every 5 minutes
1. Reports errors every 30 seconds
```go
func (p *Poller[T, A]) ServeHTTP(c *gin.Context)
```
HTTP handler for data retrieval.
## Architecture
### Core Components
```mermaid
flowchart TD
subgraph Poller
Poll[PollFunc] -->|Collects| Data[Data Point T]
Data -->|Adds to| Period[Period T]
Period -->|Stores in| Buckets[Time Buckets]
end
subgraph Time Buckets
Bucket5m[5m Bucket] -->|Holds| Entries5m[100 Entries]
Bucket15m[15m Bucket] -->|Holds| Entries15m[100 Entries]
Bucket1h[1h Bucket] -->|Holds| Entries1h[100 Entries]
Bucket1d[1d Bucket] -->|Holds| Entries1d[100 Entries]
Bucket1mo[1mo Bucket] -->|Holds| Entries1mo[100 Entries]
end
subgraph API
Handler[ServeHTTP] -->|Queries| Period
Period -->|Returns| Aggregate[Aggregated Data]
WebSocket[WebSocket] -->|Streams| Periodic[Periodic Updates]
end
subgraph Persistence
Save[save] -->|Writes| File[JSON File]
File -->|Loads| Load[load]
end
```
### Data Flow
```mermaid
sequenceDiagram
participant Collector
participant Poller
participant Period
participant Entries as Time Bucket
participant Storage
Poller->>Poller: Start background goroutine
loop Every 1 second
Poller->>Collector: poll(ctx, lastResult)
Collector-->>Poller: data, error
Poller->>Period: Add(data)
Period->>Entries: Add(now, data)
Entries->>Entries: Circular buffer write
Poller->>Poller: Check save interval (every 5min)
alt Save interval reached
Poller->>Storage: Save to JSON
end
alt Error interval reached (30s)
Poller->>Poller: Gather and log errors
end
end
```
### Time Periods
| Filter | Duration | Interval | Max Entries |
| ------ | ---------- | ------------ | ----------- |
| `5m` | 5 minutes | 3 seconds | 100 |
| `15m` | 15 minutes | 9 seconds | 100 |
| `1h` | 1 hour | 36 seconds | 100 |
| `1d` | 1 day | 14.4 minutes | 100 |
| `1mo` | 30 days | 7.2 hours | 100 |
### Circular Buffer Behavior
```mermaid
stateDiagram-v2
[*] --> Empty: NewEntries()
Empty --> Filling: Add(entry 1)
Filling --> Filling: Add(entry 2..N)
Filling --> Full: count == maxEntries
Full --> Overwrite: Add(new entry)
Overwrite --> Overwrite: index = (index + 1) % max
```
When full, new entries overwrite oldest entries (FIFO).
## Configuration Surface
### Poller Configuration
| Parameter | Type | Default | Description |
| -------------------- | ------------- | -------------- | -------------------------- |
| `PollInterval` | time.Duration | 1s | How often to poll for data |
| `saveInterval` | time.Duration | 5m | How often to save to disk |
| `gatherErrsInterval` | time.Duration | 30s | Error aggregation interval |
| `saveBaseDir` | string | `data/metrics` | Persistence directory |
### HTTP Query Parameters
| Parameter | Description |
| ------------------ | ----------------------------------- |
| `period` | Time filter (5m, 15m, 1h, 1d, 1mo) |
| `aggregate` | Aggregation mode (package-specific) |
| `interval` | WebSocket update interval |
| `limit` / `offset` | Pagination parameters |
## Dependency and Integration Map
### Internal Dependencies
None.
### External Dependencies
| Dependency | Purpose |
| ------------------------------------------ | ------------------------ |
| `github.com/gin-gonic/gin` | HTTP handling |
| `github.com/yusing/goutils/http/websocket` | WebSocket streaming |
| `github.com/bytedance/sonic` | JSON serialization |
| `github.com/yusing/goutils/task` | Lifetime management |
| `github.com/puzpuzpuz/xsync/v4` | Concurrent value storage |
### Integration Points
- Poll function collects data from external sources
- Aggregate function transforms data for visualization
- Filter function enables keyword-based filtering
- HTTP handler provides REST/WebSocket endpoints
## Observability
### Logs
| Level | When |
| ----- | ------------------------------------- |
| Debug | Poller start/stop, buffer adjustments |
| Error | Load/save failures |
| Info | Data loaded from disk |
### Metrics
None exposed directly. Poll errors are accumulated and logged periodically.
## Security Considerations
- HTTP endpoint should be protected via authentication
- Data files contain potentially sensitive metrics
- No input validation beyond basic query parsing
- WebSocket connections have configurable intervals
## Failure Modes and Recovery
| Failure | Detection | Recovery |
| -------------------- | ---------------------- | ----------------------------------- |
| Poll function error | `poll()` returns error | Error accumulated, logged every 30s |
| JSON load failure | `os.ReadFile` error | Continue with empty period |
| JSON save failure | `Encode` error | Error accumulated, logged |
| Context cancellation | `<-ctx.Done()` | Goroutine exits, final save |
| Disk full | Write error | Error logged, continue |
### Persistence Behavior
1. On startup, attempts to load existing data from `data/metrics/{name}.json`
1. If file doesn't exist, starts with empty data
1. On load, validates and fixes intervals
1. Saves every 5 minutes during operation
1. Final save on goroutine exit
## Usage Examples
### Defining a Custom Poller
```go
import "github.com/yusing/godoxy/internal/metrics/period"
type CustomMetric struct {
Timestamp int64 `json:"timestamp"`
Value float64 `json:"value"`
Name string `json:"name"`
}
func pollCustomMetric(ctx context.Context, last CustomMetric) (CustomMetric, error) {
return CustomMetric{
Timestamp: time.Now().Unix(),
Value: readSensorValue(),
Name: "sensor_1",
}, nil
}
func aggregateCustomMetric(entries []CustomMetric, query url.Values) (int, Aggregated) {
// Aggregate logic here
return len(aggregated), aggregated
}
var CustomPoller = period.NewPoller("custom", pollCustomMetric, aggregateCustomMetric)
```
### Starting the Poller
```go
// In your main initialization
CustomPoller.Start()
```
### Accessing Data
```go
// Get all entries from the last hour
entries, ok := CustomPoller.Get(period.MetricsPeriod1h)
if ok {
for _, entry := range entries {
fmt.Printf("Value: %.2f at %d\n", entry.Value, entry.Timestamp)
}
}
// Get the most recent value
latest := CustomPoller.GetLastResult()
```
### HTTP Integration
```go
import "github.com/gin-gonic/gin"
func setupMetricsAPI(r *gin.Engine) {
r.GET("/api/metrics/custom", CustomPoller.ServeHTTP)
}
```
**API Examples:**
```bash
# Get last collected data
GET /api/metrics/custom
# Get 1-hour history
GET /api/metrics/custom?period=1h
# Get 1-day history with aggregation
GET /api/metrics/custom?period=1d&aggregate=cpu_average
```
### WebSocket Integration
```go
// WebSocket connections automatically receive updates
// at the specified interval
ws, _, _ := websocket.DefaultDialer.Dial("ws://localhost/api/metrics/custom?interval=5s", nil)
for {
_, msg, _ := ws.ReadMessage()
// Process the update
}
```
### Data Persistence Format
```json
{
"entries": {
"5m": {
"entries": [...],
"interval": 3000000000
},
"15m": {...},
"1h": {...},
"1d": {...},
"1mo": {...}
}
}
```
## Performance Characteristics
- O(1) add to circular buffer
- O(1) get (returns slice view)
- O(n) serialization where n = total entries
- Memory: O(5 * 100 * sizeof(T)) = fixed overhead
- JSON load/save: O(n) where n = total entries
## Testing Notes
- Test circular buffer overflow behavior
- Test interval validation after load
- Test aggregation with various query parameters
- Test concurrent access to period
- Test error accumulation and reporting
## Related Packages
- `internal/metrics/uptime` - Uses period for health status
- `internal/metrics/systeminfo` - Uses period for system metrics

View File

@@ -0,0 +1,439 @@
# System Info
Collects and aggregates system metrics including CPU, memory, disk, network, and sensor data with configurable aggregation modes.
## Overview
The systeminfo package a custom fork of the [gopsutil](https://github.com/shirou/gopsutil) library to collect system metrics and integrates with the `period` package for time-bucketed storage. It supports collecting CPU, memory, disk, network, and sensor data with configurable collection intervals and aggregation modes for visualization.
### Primary Consumers
- `internal/api/v1/metrics` - HTTP endpoint for system metrics
- `internal/homepage` - Dashboard system monitoring widgets
- Monitoring and alerting systems
### Non-goals
- Does not provide alerting on metric thresholds
- Does not persist metrics beyond the period package retention
- Does not provide data aggregation across multiple instances
- Does not support custom metric collectors
### Stability
Internal package. Data format and API are stable.
## Public API
### Exported Types
#### SystemInfo Struct
```go
type SystemInfo struct {
Timestamp int64 `json:"timestamp"`
CPUAverage *float64 `json:"cpu_average"`
Memory mem.VirtualMemoryStat `json:"memory"`
Disks map[string]disk.UsageStat `json:"disks"`
DisksIO map[string]*disk.IOCountersStat `json:"disks_io"`
Network net.IOCountersStat `json:"network"`
Sensors Sensors `json:"sensors"`
}
```
Container for all system metrics at a point in time.
**Fields:**
- `Timestamp` - Unix timestamp of collection
- `CPUAverage` - Average CPU usage percentage (0-100)
- `Memory` - Virtual memory statistics (used, total, percent, etc.)
- `Disks` - Disk usage by partition mountpoint
- `DisksIO` - Disk I/O counters by device name
- `Network` - Network I/O counters for primary interface
- `Sensors` - Hardware temperature sensor readings
#### Sensors Type
```go
type Sensors []sensors.TemperatureStat
```
Slice of temperature sensor readings.
#### Aggregated Type
```go
type Aggregated []map[string]any
```
Aggregated data suitable for charting libraries like Recharts. Each entry is a map with timestamp and values.
#### SystemInfoAggregateMode Type
```go
type SystemInfoAggregateMode string
```
Aggregation mode constants:
```go
const (
SystemInfoAggregateModeCPUAverage SystemInfoAggregateMode = "cpu_average"
SystemInfoAggregateModeMemoryUsage SystemInfoAggregateMode = "memory_usage"
SystemInfoAggregateModeMemoryUsagePercent SystemInfoAggregateMode = "memory_usage_percent"
SystemInfoAggregateModeDisksReadSpeed SystemInfoAggregateMode = "disks_read_speed"
SystemInfoAggregateModeDisksWriteSpeed SystemInfoAggregateMode = "disks_write_speed"
SystemInfoAggregateModeDisksIOPS SystemInfoAggregateMode = "disks_iops"
SystemInfoAggregateModeDiskUsage SystemInfoAggregateMode = "disk_usage"
SystemInfoAggregateModeNetworkSpeed SystemInfoAggregateMode = "network_speed"
SystemInfoAggregateModeNetworkTransfer SystemInfoAggregateMode = "network_transfer"
SystemInfoAggregateModeSensorTemperature SystemInfoAggregateMode = "sensor_temperature"
)
```
### Exported Variables
#### Poller
```go
var Poller = period.NewPoller("system_info", getSystemInfo, aggregate)
```
Pre-configured poller for system info metrics. Start with `Poller.Start()`.
### Exported Functions
#### getSystemInfo
```go
func getSystemInfo(ctx context.Context, lastResult *SystemInfo) (*SystemInfo, error)
```
Collects current system metrics. This is the poll function passed to the period poller.
**Features:**
- Concurrent collection of all metric categories
- Handles partial failures gracefully
- Calculates rates based on previous result (for speed metrics)
- Logs warnings for non-critical errors
**Rate Calculations:**
- Disk read/write speed: `(currentBytes - lastBytes) / interval`
- Disk IOPS: `(currentCount - lastCount) / interval`
- Network speed: `(currentBytes - lastBytes) / interval`
#### aggregate
```go
func aggregate(entries []*SystemInfo, query url.Values) (total int, result Aggregated)
```
Aggregates system info entries for a specific mode. Called by the period poller.
**Query Parameters:**
- `aggregate` - The aggregation mode (see constants above)
**Returns:**
- `total` - Number of aggregated entries
- `result` - Slice of maps suitable for charting
## Architecture
### Core Components
```mermaid
flowchart TD
subgraph Collection
G[gopsutil] -->|CPU| CPU[CPU Percent]
G -->|Memory| Mem[Virtual Memory]
G -->|Disks| Disk[Partitions & IO]
G -->|Network| Net[Network Counters]
G -->|Sensors| Sens[Temperature]
end
subgraph Poller
Collect[getSystemInfo] -->|Aggregates| Info[SystemInfo]
Info -->|Stores in| Period[Period SystemInfo]
end
subgraph Aggregation Modes
CPUAvg[cpu_average]
MemUsage[memory_usage]
MemPercent[memory_usage_percent]
DiskRead[disks_read_speed]
DiskWrite[disks_write_speed]
DiskIOPS[disks_iops]
DiskUsage[disk_usage]
NetSpeed[network_speed]
NetTransfer[network_transfer]
SensorTemp[sensor_temperature]
end
Period -->|Query with| Aggregate[aggregate function]
Aggregate --> CPUAvg
Aggregate --> MemUsage
Aggregate --> DiskRead
```
### Data Flow
```mermaid
sequenceDiagram
participant gopsutil
participant Poller
participant Period
participant API
Poller->>Poller: Start background goroutine
loop Every 1 second
Poller->>gopsutil: Collect CPU (500ms timeout)
Poller->>gopsutil: Collect Memory
Poller->>gopsutil: Collect Disks (partition + IO)
Poller->>gopsutil: Collect Network
Poller->>gopsutil: Collect Sensors
gopsutil-->>Poller: SystemInfo
Poller->>Period: Add(SystemInfo)
end
API->>Period: Get(filter)
Period-->>API: Entries
API->>API: aggregate(entries, mode)
API-->>Client: Chart data
```
### Collection Categories
| Category | Data Source | Optional | Rate Metrics |
| -------- | ------------------------------------------------------ | -------- | --------------------- |
| CPU | `cpu.PercentWithContext` | Yes | No |
| Memory | `mem.VirtualMemoryWithContext` | Yes | No |
| Disks | `disk.PartitionsWithContext` + `disk.UsageWithContext` | Yes | Yes (read/write/IOPS) |
| Network | `net.IOCountersWithContext` | Yes | Yes (upload/download) |
| Sensors | `sensors.TemperaturesWithContext` | Yes | No |
### Aggregation Modes
Each mode produces chart-friendly output:
**CPU Average:**
```json
[
{ "timestamp": 1704892800, "cpu_average": 45.5 },
{ "timestamp": 1704892810, "cpu_average": 52.3 }
]
```
**Memory Usage:**
```json
[
{ "timestamp": 1704892800, "memory_usage": 8388608000 },
{ "timestamp": 1704892810, "memory_usage": 8453440000 }
]
```
**Disk Read/Write Speed:**
```json
[
{ "timestamp": 1704892800, "sda": 10485760, "sdb": 5242880 },
{ "timestamp": 1704892810, "sda": 15728640, "sdb": 4194304 }
]
```
## Configuration Surface
### Disabling Metrics Categories
Metrics categories can be disabled via environment variables:
| Variable | Purpose |
| ------------------------- | ------------------------------------------- |
| `METRICS_DISABLE_CPU` | Set to "true" to disable CPU collection |
| `METRICS_DISABLE_MEMORY` | Set to "true" to disable memory collection |
| `METRICS_DISABLE_DISK` | Set to "true" to disable disk collection |
| `METRICS_DISABLE_NETWORK` | Set to "true" to disable network collection |
| `METRICS_DISABLE_SENSORS` | Set to "true" to disable sensor collection |
## Dependency and Integration Map
### Internal Dependencies
| Package | Purpose |
| -------------------------------- | --------------------- |
| `internal/metrics/period` | Time-bucketed storage |
| `internal/common` | Configuration flags |
| `github.com/yusing/goutils/errs` | Error handling |
### External Dependencies
| Dependency | Purpose |
| ------------------------------- | ------------------------- |
| `github.com/shirou/gopsutil/v4` | System metrics collection |
| `github.com/rs/zerolog` | Logging |
### Integration Points
- gopsutil provides raw system metrics
- period package handles storage and persistence
- HTTP API provides query interface
## Observability
### Logs
| Level | When |
| ----- | ------------------------------------------ |
| Warn | Non-critical errors (e.g., no sensor data) |
| Error | Other errors |
### Metrics
No metrics exposed directly. Collection errors are logged.
## Failure Modes and Recovery
| Failure | Detection | Recovery |
| --------------- | ------------------------------------ | -------------------------------- |
| No CPU data | `cpu.Percent` returns error | Skip and log later with warning |
| No memory data | `mem.VirtualMemory` returns error | Skip and log later with warning |
| No disk data | `disk.Usage` returns error for all | Skip and log later with warning |
| No network data | `net.IOCounters` returns error | Skip and log later with warning |
| No sensor data | `sensors.Temperatures` returns error | Skip and log later with warning |
| Context timeout | Context deadline exceeded | Return partial data with warning |
### Partial Collection
The package uses `gperr.NewGroup` to collect errors from concurrent operations:
```go
errs := gperr.NewGroup("failed to get system info")
errs.Go(func() error { return s.collectCPUInfo(ctx) })
errs.Go(func() error { return s.collectMemoryInfo(ctx) })
// ...
result := errs.Wait()
```
Warnings (like `ENODATA`) are logged but don't fail the collection.
Critical errors cause the function to return an error.
## Usage Examples
### Starting the Poller
```go
import "github.com/yusing/godoxy/internal/metrics/systeminfo"
func init() {
systeminfo.Poller.Start()
}
```
### HTTP Endpoint
```go
import "github.com/gin-gonic/gin"
func setupMetricsAPI(r *gin.Engine) {
r.GET("/api/metrics/system", systeminfo.Poller.ServeHTTP)
}
```
**API Examples:**
```bash
# Get latest metrics
curl http://localhost:8080/api/metrics/system
# Get 1-hour history with CPU aggregation
curl "http://localhost:8080/api/metrics/system?period=1h&aggregate=cpu_average"
# Get 24-hour memory usage history
curl "http://localhost:8080/api/metrics/system?period=1d&aggregate=memory_usage_percent"
# Get disk I/O for the last hour
curl "http://localhost:8080/api/metrics/system?period=1h&aggregate=disks_read_speed"
```
### WebSocket Streaming
```javascript
const ws = new WebSocket(
"ws://localhost:8080/api/metrics/system?period=1m&interval=5s&aggregate=cpu_average"
);
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log("CPU:", data.data);
};
```
### Direct Data Access
```go
// Get entries for the last hour
entries, ok := systeminfo.Poller.Get(period.MetricsPeriod1h)
for _, entry := range entries {
if entry.CPUAverage != nil {
fmt.Printf("CPU: %.1f%% at %d\n", *entry.CPUAverage, entry.Timestamp)
}
}
// Get the most recent metrics
latest := systeminfo.Poller.GetLastResult()
```
### Disabling Metrics at Runtime
```go
import (
"github.com/yusing/godoxy/internal/common"
"github.com/yusing/godoxy/internal/metrics/systeminfo"
)
func init() {
// Disable expensive sensor collection
common.MetricsDisableSensors = true
systeminfo.Poller.Start()
}
```
## Performance Characteristics
- O(1) per metric collection (gopsutil handles complexity)
- Concurrent collection of all categories
- Rate calculations O(n) where n = number of disks/interfaces
- Memory: O(5 _ 100 _ sizeof(SystemInfo))
- JSON serialization O(n) for API responses
### Collection Latency
| Category | Typical Latency |
| -------- | -------------------------------------- |
| CPU | ~10-50ms |
| Memory | ~5-10ms |
| Disks | ~10-100ms (depends on partition count) |
| Network | ~5-10ms |
| Sensors | ~10-50ms |
## Testing Notes
- Mock gopsutil calls for unit tests
- Test with real metrics to verify rate calculations
- Test aggregation modes with various data sets
- Verify disable flags work correctly
- Test partial failure scenarios
## Related Packages
- `internal/metrics/period` - Time-bucketed storage
- `internal/api/v1/metrics` - HTTP API endpoints
- `github.com/shirou/gopsutil/v4` - System metrics library

View File

@@ -0,0 +1,402 @@
# Uptime
Tracks and aggregates route health status over time, providing uptime/downtime statistics and latency metrics.
## Overview
The uptime package monitors route health status and calculates uptime percentages over configurable time periods. It integrates with the `period` package for historical storage and provides aggregated statistics for visualization.
### Primary Consumers
- `internal/api/v1/metrics` - HTTP endpoint for uptime data
- `internal/homepage` - Dashboard uptime widgets
- Monitoring and alerting systems
### Non-goals
- Does not perform health checks (handled by `internal/route/routes`)
- Does not provide alerting on downtime
- Does not persist data beyond the period package retention
- Does not aggregate across multiple GoDoxy instances
### Stability
Internal package. Data format and API are stable.
## Public API
### Exported Types
#### StatusByAlias
```go
type StatusByAlias struct {
Map map[string]routes.HealthInfoWithoutDetail `json:"statuses"`
Timestamp int64 `json:"timestamp"`
}
```
Container for health status of all routes at a specific time.
#### Status
```go
type Status struct {
Status types.HealthStatus `json:"status" swaggertype:"string" enums:"healthy,unhealthy,unknown,napping,starting"`
Latency int32 `json:"latency"`
Timestamp int64 `json:"timestamp"`
}
```
Individual route status at a point in time.
#### RouteAggregate
```go
type RouteAggregate struct {
Alias string `json:"alias"`
DisplayName string `json:"display_name"`
Uptime float32 `json:"uptime"`
Downtime float32 `json:"downtime"`
Idle float32 `json:"idle"`
AvgLatency float32 `json:"avg_latency"`
IsDocker bool `json:"is_docker"`
IsExcluded bool `json:"is_excluded"`
CurrentStatus types.HealthStatus `json:"current_status" swaggertype:"string" enums:"healthy,unhealthy,unknown,napping,starting"`
Statuses []Status `json:"statuses"`
}
```
Aggregated statistics for a single route.
#### Aggregated
```go
type Aggregated []RouteAggregate
```
Slice of route aggregates, sorted alphabetically by alias.
### Exported Variables
#### Poller
```go
var Poller = period.NewPoller("uptime", getStatuses, aggregateStatuses)
```
Pre-configured poller for uptime metrics. Start with `Poller.Start()`.
### Unexported Functions
#### getStatuses
```go
func getStatuses(ctx context.Context, _ StatusByAlias) (StatusByAlias, error)
```
Collects current status of all routes. Called by the period poller every second.
**Returns:**
- `StatusByAlias` - Map of all route statuses with current timestamp
- `error` - Always nil (errors are logged internally)
#### aggregateStatuses
```go
func aggregateStatuses(entries []StatusByAlias, query url.Values) (int, Aggregated)
```
Aggregates status entries into route statistics.
**Query Parameters:**
- `period` - Time filter (5m, 15m, 1h, 1d, 1mo)
- `limit` - Maximum number of routes to return (0 = all)
- `offset` - Offset for pagination
- `keyword` - Fuzzy search keyword for filtering routes
**Returns:**
- `int` - Total number of routes matching the query
- `Aggregated` - Slice of route aggregates
## Architecture
### Core Components
```mermaid
flowchart TD
subgraph Health Monitoring
Routes[Routes] -->|GetHealthInfoWithoutDetail| Status[Status Map]
Status -->|Polls every| Second[1 Second]
end
subgraph Poller
Poll[getStatuses] -->|Collects| StatusByAlias
StatusByAlias -->|Stores in| Period[Period StatusByAlias]
end
subgraph Aggregation
Query[Query Params] -->|Filters| Aggregate[aggregateStatuses]
Aggregate -->|Calculates| RouteAggregate
RouteAggregate -->|Uptime| UP[Uptime %]
RouteAggregate -->|Downtime| DOWN[Downtime %]
RouteAggregate -->|Idle| IDLE[Idle %]
RouteAggregate -->|Latency| LAT[Avg Latency]
end
subgraph Response
RouteAggregate -->|JSON| Client[API Client]
end
```
### Data Flow
```mermaid
sequenceDiagram
participant Routes as Route Registry
participant Poller as Uptime Poller
participant Period as Period Storage
participant API as HTTP API
Routes->>Poller: GetHealthInfoWithoutDetail()
Poller->>Period: Add(StatusByAlias)
loop Every second
Poller->>Routes: Collect status
Poller->>Period: Store status
end
API->>Period: Get(filter)
Period-->>API: Entries
API->>API: aggregateStatuses()
API-->>Client: Aggregated JSON
```
### Status Types
| Status | Description | Counted as Uptime? |
| ----------- | ------------------------------ | ------------------ |
| `healthy` | Route is responding normally | Yes |
| `unhealthy` | Route is not responding | No |
| `unknown` | Status could not be determined | Excluded |
| `napping` | Route is in idle/sleep state | Idle (separate) |
| `starting` | Route is starting up | Idle (separate) |
### Calculation Formula
For a set of status entries:
```
Uptime = healthy_count / total_count
Downtime = unhealthy_count / total_count
Idle = (napping_count + starting_count) / total_count
AvgLatency = sum(latency) / count
```
Note: `unknown` statuses are excluded from all calculations.
## Configuration Surface
No explicit configuration. The poller uses period package defaults:
| Parameter | Value |
| ------------- | ---------------------------- |
| Poll Interval | 1 second |
| Retention | 5m, 15m, 1h, 1d, 1mo periods |
## Dependency and Integration Map
### Internal Dependencies
| Package | Purpose |
| ------------------------- | --------------------- |
| `internal/route/routes` | Health info retrieval |
| `internal/metrics/period` | Time-bucketed storage |
| `internal/types` | HealthStatus enum |
| `internal/metrics/utils` | Query utilities |
### External Dependencies
| Dependency | Purpose |
| ---------------------------------------- | ---------------- |
| `github.com/lithammer/fuzzysearch/fuzzy` | Keyword matching |
| `github.com/bytedance/sonic` | JSON marshaling |
### Integration Points
- Route health monitors provide status via `routes.GetHealthInfoWithoutDetail()`
- Period poller handles data collection and storage
- HTTP API provides query interface via `Poller.ServeHTTP`
## Observability
### Logs
Poller lifecycle and errors are logged via zerolog.
### Metrics
No metrics exposed directly. Status data available via API.
## Failure Modes and Recovery
| Failure | Detection | Recovery |
| -------------------------------- | --------------------------------- | ------------------------------ |
| Route health monitor unavailable | Empty map returned | Log warning, continue |
| Invalid query parameters | `aggregateStatuses` returns empty | Return empty result |
| Poller panic | Goroutine crash | Process terminates |
| Persistence failure | Load/save error | Log, continue with empty state |
### Fuzzy Search
The package uses `fuzzy.MatchFold` for keyword matching:
- Case-insensitive matching
- Substring matching
- Fuzzy ranking
## Usage Examples
### Starting the Poller
```go
import "github.com/yusing/godoxy/internal/metrics/uptime"
func init() {
uptime.Poller.Start()
}
```
### HTTP Endpoint
```go
import (
"github.com/gin-gonic/gin"
"github.com/yusing/godoxy/internal/metrics/uptime"
)
func setupUptimeAPI(r *gin.Engine) {
r.GET("/api/uptime", uptime.Poller.ServeHTTP)
}
```
**API Examples:**
```bash
# Get latest status
curl http://localhost:8080/api/uptime
# Get 1-hour history
curl "http://localhost:8080/api/uptime?period=1h"
# Get with limit and offset (pagination)
curl "http://localhost:8080/api/uptime?limit=10&offset=0"
# Search for routes containing "api"
curl "http://localhost:8080/api/uptime?keyword=api"
# Combined query
curl "http://localhost:8080/api/uptime?period=1d&limit=20&offset=0&keyword=docker"
```
### WebSocket Streaming
```javascript
const ws = new WebSocket(
"ws://localhost:8080/api/uptime?period=1m&interval=5s"
);
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
data.data.forEach((route) => {
console.log(`${route.display_name}: ${route.uptime * 100}% uptime`);
});
};
```
### Direct Data Access
```go
// Get entries for the last hour
entries, ok := uptime.Poller.Get(period.MetricsPeriod1h)
for _, entry := range entries {
for alias, status := range entry.Map {
fmt.Printf("Route %s: %s (latency: %dms)\n",
alias, status.Status, status.Latency.Milliseconds())
}
}
// Get aggregated statistics
_, agg := uptime.aggregateStatuses(entries, url.Values{
"period": []string{"1h"},
})
for _, route := range agg {
fmt.Printf("%s: %.1f%% uptime, %.1fms avg latency\n",
route.DisplayName, route.Uptime*100, route.AvgLatency)
}
```
### Response Format
**Latest Status Response:**
```json
{
"alias1": {
"status": "healthy",
"latency": 45
},
"alias2": {
"status": "unhealthy",
"latency": 0
}
}
```
**Aggregated Response:**
```json
{
"total": 5,
"data": [
{
"alias": "api-server",
"display_name": "API Server",
"uptime": 0.98,
"downtime": 0.02,
"idle": 0.0,
"avg_latency": 45.5,
"is_docker": true,
"is_excluded": false,
"current_status": "healthy",
"statuses": [
{ "status": "healthy", "latency": 45, "timestamp": 1704892800 }
]
}
]
}
```
## Performance Characteristics
- O(n) status collection per poll where n = number of routes
- O(m \* k) aggregation where m = entries, k = routes
- Memory: O(p _ r _ s) where p = periods, r = routes, s = status size
- Fuzzy search is O(routes \* keyword_length)
## Testing Notes
- Mock `routes.GetHealthInfoWithoutDetail()` for testing
- Test aggregation with known status sequences
- Verify pagination and filtering logic
- Test fuzzy search matching
## Related Packages
- `internal/route/routes` - Route health monitoring
- `internal/metrics/period` - Time-bucketed metrics storage
- `internal/types` - Health status types