docs: add per package README for implementation details (AI generated with human review)

This commit is contained in:
yusing
2026-01-08 23:39:19 +08:00
parent 13441286d1
commit e9d7edef12
54 changed files with 13431 additions and 1519 deletions

View File

@@ -1,285 +1,118 @@
# Metrics Package
System monitoring and metrics collection for GoDoxy.
System monitoring and metrics collection for GoDoxy with time-series storage and REST/WebSocket APIs.
## Overview
This package provides a unified metrics collection system that polls system and route data at regular intervals, stores historical data across multiple time periods, and exposes both REST and WebSocket APIs for consumption.
This package provides a unified metrics collection system that:
- Polls system and route data at regular intervals
- Stores historical data across multiple time periods
- Exposes both REST and WebSocket APIs for consumption
### Primary Consumers
- `internal/api/v1/metrics/` - REST API endpoints
- WebUI - Real-time charts
- `internal/metrics/uptime/` - Route health monitoring
### Non-goals
- Metric aggregation from external sources
- Alerting (handled by `internal/notif/`)
- Long-term storage (30-day retention only)
### Stability
Internal package. See `internal/metrics/period/README.md` for the core framework documentation.
## Packages
### `period/`
Generic time-bucketed metrics storage framework with:
- `Period[T]` - Multi-timeframe container
- `Poller[T, A]` - Background data collector
- `Entries[T]` - Circular buffer for time-series data
See [period/README.md](./period/README.md) for full documentation.
### `uptime/`
Route health status monitoring using the period framework.
### `systeminfo/`
System metrics collection (CPU, memory, disk, network, sensors) using the period framework.
## Architecture
```mermaid
graph TB
subgraph "Core Framework"
P[Period<T> Generic]
E[Entries<T> Ring Buffer]
PL[Poller<T, A> Orchestrator]
end
subgraph "Data Sources"
SI[SystemInfo Poller]
UP[Uptime Poller]
end
subgraph "Utilities"
UT[Utils]
subgraph "Period Framework"
P[Period<T> Generic]
E[Entries<T> Ring Buffer]
PL[Poller<T, A> Orchestrator]
H[Handler HTTP API]
end
subgraph "Storage"
JSON[(data/metrics/*.json)]
end
P --> E
PL --> P
PL --> SI
PL --> UP
UT -.-> PL
UT -.-> SI
UT -.-> UP
H --> PL
PL --> JSON
```
## Directory Structure
## Configuration Surface
```
internal/metrics/
├── period/ # Core polling and storage framework
│ ├── period.go # Period[T] - multi-timeframe container
│ ├── entries.go # Entries[T] - ring buffer implementation
│ ├── poller.go # Poller[T, A] - orchestration and lifecycle
│ └── handler.go # HTTP handler for data access
├── systeminfo/ # System metrics (CPU, memory, disk, network, sensors)
├── uptime/ # Route health and uptime monitoring
└── utils/ # Shared utilities (query parsing, pagination)
```
No explicit configuration. Pollers respect `common.MetricsDisable*` flags:
## Core Components
| Flag | Disables |
| ----------------------- | ------------------------- |
| `MetricsDisableCPU` | CPU percentage collection |
| `MetricsDisableMemory` | Memory statistics |
| `MetricsDisableDisk` | Disk usage and I/O |
| `MetricsDisableNetwork` | Network counters |
| `MetricsDisableSensors` | Temperature sensors |
### 1. Period[T] (`period/period.go`)
## Dependency and Integration Map
A generic container that manages multiple time periods for the same data type.
### Internal Dependencies
```go
type Period[T any] struct {
Entries map[Filter]*Entries[T] // 5m, 15m, 1h, 1d, 1mo
mu sync.RWMutex
}
```
- `github.com/yusing/goutils/task` - Lifetime management
- `internal/types` - Health check types
**Time Periods:**
### External Dependencies
| Filter | Duration | Entries | Interval |
| ------ | -------- | ------- | -------- |
| `5m` | 5 min | 100 | 3s |
| `15m` | 15 min | 100 | 9s |
| `1h` | 1 hour | 100 | 36s |
| `1d` | 1 day | 100 | 14.4m |
| `1mo` | 30 days | 100 | 7.2h |
- `github.com/shirou/gopsutil/v4` - System metrics collection
- `github.com/puzpuzpuz/xsync/v4` - Atomic value storage
- `github.com/bytedance/sonic` - JSON serialization
### 2. Entries[T] (`period/entries.go`)
## Observability
A fixed-size ring buffer (100 entries) with time-aware sampling.
### Logs
```go
type Entries[T any] struct {
entries [100]T // Fixed-size array
index int // Current position
count int // Number of entries
interval time.Duration // Sampling interval
lastAdd time.Time // Last write timestamp
}
```
| Level | When |
| ------- | ------------------------------------------- |
| `Debug` | Poller start, data load/save |
| `Error` | Data source failures (aggregated every 30s) |
**Features:**
## Failure Modes and Recovery
- Circular buffer for efficient memory usage
- Rate-limited adds (respects configured interval)
- JSON serialization/deserialization with temporal spacing
### 3. Poller[T, A] (`period/poller.go`)
The orchestrator that ties together polling, storage, and HTTP serving.
```go
type Poller[T any, A any] struct {
name string
poll PollFunc[T] // Data collection
aggregate AggregateFunc[T, A] // Data aggregation
resultFilter FilterFunc[T] // Query filtering
period *Period[T] // Data storage
lastResult synk.Value[T] // Latest snapshot
}
```
**Poll Cycle (1 second interval):**
```mermaid
sequenceDiagram
participant T as Task
participant P as Poller
participant D as Data Source
participant S as Storage (Period)
participant F as File
T->>P: Start()
P->>F: Load historical data
F-->>P: Period[T] state
loop Every 1 second
P->>D: Poll(ctx, lastResult)
D-->>P: New data point
P->>S: Add to all periods
P->>P: Update lastResult
alt Every 30 seconds
P->>P: Gather & log errors
end
alt Every 5 minutes
P->>F: Persist to JSON
end
end
```
### 4. HTTP Handler (`period/handler.go`)
Provides REST and WebSocket endpoints for data access.
**Endpoints:**
- `GET /metrics?period=5m&aggregate=cpu_average` - Historical data
- `WS /metrics?period=5m&interval=5s` - Streaming updates
**Query Parameters:**
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `period` | Filter | (none) | Time range (5m, 15m, 1h, 1d, 1mo) |
| `aggregate` | string | (varies) | Aggregation mode |
| `interval` | duration | 1s | WebSocket update interval |
| `limit` | int | 0 | Max results (0 = all) |
| `offset` | int | 0 | Pagination offset |
| `keyword` | string | "" | Fuzzy search filter |
## Implementations
### SystemInfo Poller
Collects system metrics using `gopsutil`:
```go
type SystemInfo struct {
Timestamp int64
CPUAverage *float64
Memory mem.VirtualMemoryStat
Disks map[string]disk.UsageStat
DisksIO map[string]*disk.IOCountersStat
Network net.IOCountersStat
Sensors Sensors
}
```
**Aggregation Modes:**
- `cpu_average` - CPU usage percentage
- `memory_usage` - Memory used in bytes
- `memory_usage_percent` - Memory usage percentage
- `disks_read_speed` - Disk read speed (bytes/s)
- `disks_write_speed` - Disk write speed (bytes/s)
- `disks_iops` - Disk I/O operations per second
- `disk_usage` - Disk usage in bytes
- `network_speed` - Upload/download speed (bytes/s)
- `network_transfer` - Total bytes transferred
- `sensor_temperature` - Temperature sensor readings
### Uptime Poller
Monitors route health and calculates uptime statistics:
```go
type RouteAggregate struct {
Alias string
DisplayName string
Uptime float32 // Percentage healthy
Downtime float32 // Percentage unhealthy
Idle float32 // Percentage napping/starting
AvgLatency float32 // Average latency in ms
CurrentStatus HealthStatus
Statuses []Status // Historical statuses
}
```
## Data Flow
```mermaid
flowchart TD
A[Data Source] -->|PollFunc| B[Poller]
B -->|Add| C[Period.Entries]
C -->|Ring Buffer| D[(Memory)]
D -->|Every 5min| E[(data/metrics/*.json)]
B -->|HTTP Request| F[ServeHTTP]
F -->|Filter| G[Get]
G -->|Aggregate| H[Response]
F -->|WebSocket| I[PeriodicWrite]
I -->|interval| J[Push Updates]
```
## Persistence
Data is persisted to `data/metrics/` as JSON files:
```json
{
"entries": {
"5m": {
"entries": [...],
"interval": "3s"
},
"15m": {...},
"1h": {...},
"1d": {...},
"1mo": {...}
}
}
```
**On Load:**
- Validates and fixes interval mismatches
- Reconstructs temporal spacing for historical entries
## Thread Safety
- `Period[T]` uses `sync.RWMutex` for concurrent access
- `Entries[T]` is append-only (safe for single writer)
- `Poller` uses `synk.Value[T]` for atomic last result storage
## Creating a New Poller
```go
type MyData struct {
Value int
}
type MyAggregate struct {
Values []int
}
var MyPoller = period.NewPoller(
"my_poll_name",
func(ctx context.Context, last *MyData) (*MyData, error) {
// Fetch data
return &MyData{Value: 42}, nil
},
func(entries []*MyData, query url.Values) (int, MyAggregate) {
// Aggregate for API response
return len(entries), MyAggregate{Values: [...]}
},
)
func init() {
MyPoller.Start()
}
```
## Error Handling
- Poll errors are aggregated over 30-second windows
- Errors are logged with frequency counts
- Individual sensor warnings (e.g., ENODATA) are suppressed gracefully
| Failure Mode | Impact | Recovery |
| ------------------------- | -------------------- | -------------------------------- |
| Data source timeout | Missing data point | Logged, aggregated, continues |
| Disk read failure | No historical data | Starts fresh, warns |
| Disk write failure | Data loss on restart | Continues, retries next interval |
| Memory allocation failure | OOM risk | Go runtime handles |