mirror of
https://github.com/yusing/godoxy.git
synced 2026-04-24 09:18:31 +02:00
docs: add per package README for implementation details (AI generated with human review)
This commit is contained in:
118
internal/metrics/README.md
Normal file
118
internal/metrics/README.md
Normal file
@@ -0,0 +1,118 @@
|
||||
# Metrics Package
|
||||
|
||||
System monitoring and metrics collection for GoDoxy with time-series storage and REST/WebSocket APIs.
|
||||
|
||||
## Overview
|
||||
|
||||
This package provides a unified metrics collection system that:
|
||||
|
||||
- Polls system and route data at regular intervals
|
||||
- Stores historical data across multiple time periods
|
||||
- Exposes both REST and WebSocket APIs for consumption
|
||||
|
||||
### Primary Consumers
|
||||
|
||||
- `internal/api/v1/metrics/` - REST API endpoints
|
||||
- WebUI - Real-time charts
|
||||
- `internal/metrics/uptime/` - Route health monitoring
|
||||
|
||||
### Non-goals
|
||||
|
||||
- Metric aggregation from external sources
|
||||
- Alerting (handled by `internal/notif/`)
|
||||
- Long-term storage (30-day retention only)
|
||||
|
||||
### Stability
|
||||
|
||||
Internal package. See `internal/metrics/period/README.md` for the core framework documentation.
|
||||
|
||||
## Packages
|
||||
|
||||
### `period/`
|
||||
|
||||
Generic time-bucketed metrics storage framework with:
|
||||
|
||||
- `Period[T]` - Multi-timeframe container
|
||||
- `Poller[T, A]` - Background data collector
|
||||
- `Entries[T]` - Circular buffer for time-series data
|
||||
|
||||
See [period/README.md](./period/README.md) for full documentation.
|
||||
|
||||
### `uptime/`
|
||||
|
||||
Route health status monitoring using the period framework.
|
||||
|
||||
### `systeminfo/`
|
||||
|
||||
System metrics collection (CPU, memory, disk, network, sensors) using the period framework.
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Data Sources"
|
||||
SI[SystemInfo Poller]
|
||||
UP[Uptime Poller]
|
||||
end
|
||||
|
||||
subgraph "Period Framework"
|
||||
P[Period<T> Generic]
|
||||
E[Entries<T> Ring Buffer]
|
||||
PL[Poller<T, A> Orchestrator]
|
||||
H[Handler HTTP API]
|
||||
end
|
||||
|
||||
subgraph "Storage"
|
||||
JSON[(data/metrics/*.json)]
|
||||
end
|
||||
|
||||
P --> E
|
||||
PL --> P
|
||||
PL --> SI
|
||||
PL --> UP
|
||||
H --> PL
|
||||
PL --> JSON
|
||||
```
|
||||
|
||||
## Configuration Surface
|
||||
|
||||
No explicit configuration. Pollers respect `common.MetricsDisable*` flags:
|
||||
|
||||
| Flag | Disables |
|
||||
| ----------------------- | ------------------------- |
|
||||
| `MetricsDisableCPU` | CPU percentage collection |
|
||||
| `MetricsDisableMemory` | Memory statistics |
|
||||
| `MetricsDisableDisk` | Disk usage and I/O |
|
||||
| `MetricsDisableNetwork` | Network counters |
|
||||
| `MetricsDisableSensors` | Temperature sensors |
|
||||
|
||||
## Dependency and Integration Map
|
||||
|
||||
### Internal Dependencies
|
||||
|
||||
- `github.com/yusing/goutils/task` - Lifetime management
|
||||
- `internal/types` - Health check types
|
||||
|
||||
### External Dependencies
|
||||
|
||||
- `github.com/shirou/gopsutil/v4` - System metrics collection
|
||||
- `github.com/puzpuzpuz/xsync/v4` - Atomic value storage
|
||||
- `github.com/bytedance/sonic` - JSON serialization
|
||||
|
||||
## Observability
|
||||
|
||||
### Logs
|
||||
|
||||
| Level | When |
|
||||
| ------- | ------------------------------------------- |
|
||||
| `Debug` | Poller start, data load/save |
|
||||
| `Error` | Data source failures (aggregated every 30s) |
|
||||
|
||||
## Failure Modes and Recovery
|
||||
|
||||
| Failure Mode | Impact | Recovery |
|
||||
| ------------------------- | -------------------- | -------------------------------- |
|
||||
| Data source timeout | Missing data point | Logged, aggregated, continues |
|
||||
| Disk read failure | No historical data | Starts fresh, warns |
|
||||
| Disk write failure | Data loss on restart | Continues, retries next interval |
|
||||
| Memory allocation failure | OOM risk | Go runtime handles |
|
||||
470
internal/metrics/period/README.md
Normal file
470
internal/metrics/period/README.md
Normal file
@@ -0,0 +1,470 @@
|
||||
# Period Metrics
|
||||
|
||||
Provides time-bucketed metrics storage with configurable periods, enabling historical data aggregation and real-time streaming.
|
||||
|
||||
## Overview
|
||||
|
||||
The period package implements a generic metrics collection system with time-bucketed storage. It collects data points at regular intervals and stores them in predefined time windows (5m, 15m, 1h, 1d, 1mo) with automatic persistence and HTTP/WebSocket APIs.
|
||||
|
||||
### Primary Consumers
|
||||
|
||||
- `internal/metrics/uptime` - Route health status storage
|
||||
- `internal/metrics/systeminfo` - System metrics storage
|
||||
- `internal/api/v1/metrics` - HTTP API endpoints
|
||||
|
||||
### Non-goals
|
||||
|
||||
- Does not provide data visualization
|
||||
- Does not implement alerting or anomaly detection
|
||||
- Does not support custom time periods (fixed set only)
|
||||
- Does not provide data aggregation across multiple instances
|
||||
|
||||
### Stability
|
||||
|
||||
Internal package. Public interfaces are stable.
|
||||
|
||||
## Public API
|
||||
|
||||
### Exported Types
|
||||
|
||||
#### Period[T] Struct
|
||||
|
||||
```go
|
||||
type Period[T any] struct {
|
||||
Entries map[Filter]*Entries[T]
|
||||
mu sync.RWMutex
|
||||
}
|
||||
```
|
||||
|
||||
Container for all time-bucketed entries. Maps each filter to its corresponding `Entries`.
|
||||
|
||||
**Methods:**
|
||||
|
||||
- `Add(info T)` - Adds a data point to all periods
|
||||
- `Get(filter Filter) ([]T, bool)` - Gets entries for a specific period
|
||||
- `Total() int` - Returns total number of entries across all periods
|
||||
- `ValidateAndFixIntervals()` - Validates and fixes intervals after loading
|
||||
|
||||
#### Entries[T] Struct
|
||||
|
||||
```go
|
||||
type Entries[T any] struct {
|
||||
entries [maxEntries]T
|
||||
index int
|
||||
count int
|
||||
interval time.Duration
|
||||
lastAdd time.Time
|
||||
}
|
||||
```
|
||||
|
||||
Circular buffer holding up to 100 entries for a single time period.
|
||||
|
||||
**Methods:**
|
||||
|
||||
- `Add(now time.Time, info T)` - Adds an entry with interval checking
|
||||
- `Get() []T` - Returns all entries in chronological order
|
||||
|
||||
#### Filter Type
|
||||
|
||||
```go
|
||||
type Filter string
|
||||
```
|
||||
|
||||
Time period filter.
|
||||
|
||||
```go
|
||||
const (
|
||||
MetricsPeriod5m Filter = "5m"
|
||||
MetricsPeriod15m Filter = "15m"
|
||||
MetricsPeriod1h Filter = "1h"
|
||||
MetricsPeriod1d Filter = "1d"
|
||||
MetricsPeriod1mo Filter = "1mo"
|
||||
)
|
||||
```
|
||||
|
||||
#### Poller[T, A] Struct
|
||||
|
||||
```go
|
||||
type Poller[T any, A any] struct {
|
||||
name string
|
||||
poll PollFunc[T]
|
||||
aggregate AggregateFunc[T, A]
|
||||
resultFilter FilterFunc[T]
|
||||
period *Period[T]
|
||||
lastResult synk.Value[T]
|
||||
errs []pollErr
|
||||
}
|
||||
```
|
||||
|
||||
Generic poller that collects data at regular intervals.
|
||||
|
||||
**Type Aliases:**
|
||||
|
||||
```go
|
||||
type PollFunc[T any] func(ctx context.Context, lastResult T) (T, error)
|
||||
type AggregateFunc[T any, A any] func(entries []T, query url.Values) (total int, result A)
|
||||
type FilterFunc[T any] func(entries []T, keyword string) (filtered []T)
|
||||
```
|
||||
|
||||
#### ResponseType[AggregateT]
|
||||
|
||||
```go
|
||||
type ResponseType[AggregateT any] struct {
|
||||
Total int `json:"total"`
|
||||
Data AggregateT `json:"data"`
|
||||
}
|
||||
```
|
||||
|
||||
Standard response format for API endpoints.
|
||||
|
||||
### Exported Functions
|
||||
|
||||
#### Period Constructors
|
||||
|
||||
```go
|
||||
func NewPeriod[T any]() *Period[T]
|
||||
```
|
||||
|
||||
Creates a new `Period[T]` with all time buckets initialized.
|
||||
|
||||
#### Poller Constructors
|
||||
|
||||
```go
|
||||
func NewPoller[T any, A any](
|
||||
name string,
|
||||
poll PollFunc[T],
|
||||
aggregator AggregateFunc[T, A],
|
||||
) *Poller[T, A]
|
||||
```
|
||||
|
||||
Creates a new poller with the specified name, poll function, and aggregator.
|
||||
|
||||
```go
|
||||
func (p *Poller[T, A]) WithResultFilter(filter FilterFunc[T]) *Poller[T, A]
|
||||
```
|
||||
|
||||
Adds a result filter to the poller for keyword-based filtering.
|
||||
|
||||
#### Poller Methods
|
||||
|
||||
```go
|
||||
func (p *Poller[T, A]) Get(filter Filter) ([]T, bool)
|
||||
```
|
||||
|
||||
Gets entries for a specific time period.
|
||||
|
||||
```go
|
||||
func (p *Poller[T, A]) GetLastResult() T
|
||||
```
|
||||
|
||||
Gets the most recently collected data point.
|
||||
|
||||
```go
|
||||
func (p *Poller[T, A]) Start()
|
||||
```
|
||||
|
||||
Starts the poller. Launches a background goroutine that:
|
||||
|
||||
1. Polls for data at 1-second intervals
|
||||
1. Stores data in all time buckets
|
||||
1. Saves data to disk every 5 minutes
|
||||
1. Reports errors every 30 seconds
|
||||
|
||||
```go
|
||||
func (p *Poller[T, A]) ServeHTTP(c *gin.Context)
|
||||
```
|
||||
|
||||
HTTP handler for data retrieval.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Core Components
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph Poller
|
||||
Poll[PollFunc] -->|Collects| Data[Data Point T]
|
||||
Data -->|Adds to| Period[Period T]
|
||||
Period -->|Stores in| Buckets[Time Buckets]
|
||||
end
|
||||
|
||||
subgraph Time Buckets
|
||||
Bucket5m[5m Bucket] -->|Holds| Entries5m[100 Entries]
|
||||
Bucket15m[15m Bucket] -->|Holds| Entries15m[100 Entries]
|
||||
Bucket1h[1h Bucket] -->|Holds| Entries1h[100 Entries]
|
||||
Bucket1d[1d Bucket] -->|Holds| Entries1d[100 Entries]
|
||||
Bucket1mo[1mo Bucket] -->|Holds| Entries1mo[100 Entries]
|
||||
end
|
||||
|
||||
subgraph API
|
||||
Handler[ServeHTTP] -->|Queries| Period
|
||||
Period -->|Returns| Aggregate[Aggregated Data]
|
||||
WebSocket[WebSocket] -->|Streams| Periodic[Periodic Updates]
|
||||
end
|
||||
|
||||
subgraph Persistence
|
||||
Save[save] -->|Writes| File[JSON File]
|
||||
File -->|Loads| Load[load]
|
||||
end
|
||||
```
|
||||
|
||||
### Data Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Collector
|
||||
participant Poller
|
||||
participant Period
|
||||
participant Entries as Time Bucket
|
||||
participant Storage
|
||||
|
||||
Poller->>Poller: Start background goroutine
|
||||
|
||||
loop Every 1 second
|
||||
Poller->>Collector: poll(ctx, lastResult)
|
||||
Collector-->>Poller: data, error
|
||||
Poller->>Period: Add(data)
|
||||
Period->>Entries: Add(now, data)
|
||||
Entries->>Entries: Circular buffer write
|
||||
|
||||
Poller->>Poller: Check save interval (every 5min)
|
||||
alt Save interval reached
|
||||
Poller->>Storage: Save to JSON
|
||||
end
|
||||
|
||||
alt Error interval reached (30s)
|
||||
Poller->>Poller: Gather and log errors
|
||||
end
|
||||
end
|
||||
```
|
||||
|
||||
### Time Periods
|
||||
|
||||
| Filter | Duration | Interval | Max Entries |
|
||||
| ------ | ---------- | ------------ | ----------- |
|
||||
| `5m` | 5 minutes | 3 seconds | 100 |
|
||||
| `15m` | 15 minutes | 9 seconds | 100 |
|
||||
| `1h` | 1 hour | 36 seconds | 100 |
|
||||
| `1d` | 1 day | 14.4 minutes | 100 |
|
||||
| `1mo` | 30 days | 7.2 hours | 100 |
|
||||
|
||||
### Circular Buffer Behavior
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> Empty: NewEntries()
|
||||
Empty --> Filling: Add(entry 1)
|
||||
Filling --> Filling: Add(entry 2..N)
|
||||
Filling --> Full: count == maxEntries
|
||||
Full --> Overwrite: Add(new entry)
|
||||
Overwrite --> Overwrite: index = (index + 1) % max
|
||||
```
|
||||
|
||||
When full, new entries overwrite oldest entries (FIFO).
|
||||
|
||||
## Configuration Surface
|
||||
|
||||
### Poller Configuration
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
| -------------------- | ------------- | -------------- | -------------------------- |
|
||||
| `PollInterval` | time.Duration | 1s | How often to poll for data |
|
||||
| `saveInterval` | time.Duration | 5m | How often to save to disk |
|
||||
| `gatherErrsInterval` | time.Duration | 30s | Error aggregation interval |
|
||||
| `saveBaseDir` | string | `data/metrics` | Persistence directory |
|
||||
|
||||
### HTTP Query Parameters
|
||||
|
||||
| Parameter | Description |
|
||||
| ------------------ | ----------------------------------- |
|
||||
| `period` | Time filter (5m, 15m, 1h, 1d, 1mo) |
|
||||
| `aggregate` | Aggregation mode (package-specific) |
|
||||
| `interval` | WebSocket update interval |
|
||||
| `limit` / `offset` | Pagination parameters |
|
||||
|
||||
## Dependency and Integration Map
|
||||
|
||||
### Internal Dependencies
|
||||
|
||||
None.
|
||||
|
||||
### External Dependencies
|
||||
|
||||
| Dependency | Purpose |
|
||||
| ------------------------------------------ | ------------------------ |
|
||||
| `github.com/gin-gonic/gin` | HTTP handling |
|
||||
| `github.com/yusing/goutils/http/websocket` | WebSocket streaming |
|
||||
| `github.com/bytedance/sonic` | JSON serialization |
|
||||
| `github.com/yusing/goutils/task` | Lifetime management |
|
||||
| `github.com/puzpuzpuz/xsync/v4` | Concurrent value storage |
|
||||
|
||||
### Integration Points
|
||||
|
||||
- Poll function collects data from external sources
|
||||
- Aggregate function transforms data for visualization
|
||||
- Filter function enables keyword-based filtering
|
||||
- HTTP handler provides REST/WebSocket endpoints
|
||||
|
||||
## Observability
|
||||
|
||||
### Logs
|
||||
|
||||
| Level | When |
|
||||
| ----- | ------------------------------------- |
|
||||
| Debug | Poller start/stop, buffer adjustments |
|
||||
| Error | Load/save failures |
|
||||
| Info | Data loaded from disk |
|
||||
|
||||
### Metrics
|
||||
|
||||
None exposed directly. Poll errors are accumulated and logged periodically.
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- HTTP endpoint should be protected via authentication
|
||||
- Data files contain potentially sensitive metrics
|
||||
- No input validation beyond basic query parsing
|
||||
- WebSocket connections have configurable intervals
|
||||
|
||||
## Failure Modes and Recovery
|
||||
|
||||
| Failure | Detection | Recovery |
|
||||
| -------------------- | ---------------------- | ----------------------------------- |
|
||||
| Poll function error | `poll()` returns error | Error accumulated, logged every 30s |
|
||||
| JSON load failure | `os.ReadFile` error | Continue with empty period |
|
||||
| JSON save failure | `Encode` error | Error accumulated, logged |
|
||||
| Context cancellation | `<-ctx.Done()` | Goroutine exits, final save |
|
||||
| Disk full | Write error | Error logged, continue |
|
||||
|
||||
### Persistence Behavior
|
||||
|
||||
1. On startup, attempts to load existing data from `data/metrics/{name}.json`
|
||||
1. If file doesn't exist, starts with empty data
|
||||
1. On load, validates and fixes intervals
|
||||
1. Saves every 5 minutes during operation
|
||||
1. Final save on goroutine exit
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Defining a Custom Poller
|
||||
|
||||
```go
|
||||
import "github.com/yusing/godoxy/internal/metrics/period"
|
||||
|
||||
type CustomMetric struct {
|
||||
Timestamp int64 `json:"timestamp"`
|
||||
Value float64 `json:"value"`
|
||||
Name string `json:"name"`
|
||||
}
|
||||
|
||||
func pollCustomMetric(ctx context.Context, last CustomMetric) (CustomMetric, error) {
|
||||
return CustomMetric{
|
||||
Timestamp: time.Now().Unix(),
|
||||
Value: readSensorValue(),
|
||||
Name: "sensor_1",
|
||||
}, nil
|
||||
}
|
||||
|
||||
func aggregateCustomMetric(entries []CustomMetric, query url.Values) (int, Aggregated) {
|
||||
// Aggregate logic here
|
||||
return len(aggregated), aggregated
|
||||
}
|
||||
|
||||
var CustomPoller = period.NewPoller("custom", pollCustomMetric, aggregateCustomMetric)
|
||||
```
|
||||
|
||||
### Starting the Poller
|
||||
|
||||
```go
|
||||
// In your main initialization
|
||||
CustomPoller.Start()
|
||||
```
|
||||
|
||||
### Accessing Data
|
||||
|
||||
```go
|
||||
// Get all entries from the last hour
|
||||
entries, ok := CustomPoller.Get(period.MetricsPeriod1h)
|
||||
if ok {
|
||||
for _, entry := range entries {
|
||||
fmt.Printf("Value: %.2f at %d\n", entry.Value, entry.Timestamp)
|
||||
}
|
||||
}
|
||||
|
||||
// Get the most recent value
|
||||
latest := CustomPoller.GetLastResult()
|
||||
```
|
||||
|
||||
### HTTP Integration
|
||||
|
||||
```go
|
||||
import "github.com/gin-gonic/gin"
|
||||
|
||||
func setupMetricsAPI(r *gin.Engine) {
|
||||
r.GET("/api/metrics/custom", CustomPoller.ServeHTTP)
|
||||
}
|
||||
```
|
||||
|
||||
**API Examples:**
|
||||
|
||||
```bash
|
||||
# Get last collected data
|
||||
GET /api/metrics/custom
|
||||
|
||||
# Get 1-hour history
|
||||
GET /api/metrics/custom?period=1h
|
||||
|
||||
# Get 1-day history with aggregation
|
||||
GET /api/metrics/custom?period=1d&aggregate=cpu_average
|
||||
```
|
||||
|
||||
### WebSocket Integration
|
||||
|
||||
```go
|
||||
// WebSocket connections automatically receive updates
|
||||
// at the specified interval
|
||||
ws, _, _ := websocket.DefaultDialer.Dial("ws://localhost/api/metrics/custom?interval=5s", nil)
|
||||
|
||||
for {
|
||||
_, msg, _ := ws.ReadMessage()
|
||||
// Process the update
|
||||
}
|
||||
```
|
||||
|
||||
### Data Persistence Format
|
||||
|
||||
```json
|
||||
{
|
||||
"entries": {
|
||||
"5m": {
|
||||
"entries": [...],
|
||||
"interval": 3000000000
|
||||
},
|
||||
"15m": {...},
|
||||
"1h": {...},
|
||||
"1d": {...},
|
||||
"1mo": {...}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- O(1) add to circular buffer
|
||||
- O(1) get (returns slice view)
|
||||
- O(n) serialization where n = total entries
|
||||
- Memory: O(5 * 100 * sizeof(T)) = fixed overhead
|
||||
- JSON load/save: O(n) where n = total entries
|
||||
|
||||
## Testing Notes
|
||||
|
||||
- Test circular buffer overflow behavior
|
||||
- Test interval validation after load
|
||||
- Test aggregation with various query parameters
|
||||
- Test concurrent access to period
|
||||
- Test error accumulation and reporting
|
||||
|
||||
## Related Packages
|
||||
|
||||
- `internal/metrics/uptime` - Uses period for health status
|
||||
- `internal/metrics/systeminfo` - Uses period for system metrics
|
||||
439
internal/metrics/systeminfo/README.md
Normal file
439
internal/metrics/systeminfo/README.md
Normal file
@@ -0,0 +1,439 @@
|
||||
# System Info
|
||||
|
||||
Collects and aggregates system metrics including CPU, memory, disk, network, and sensor data with configurable aggregation modes.
|
||||
|
||||
## Overview
|
||||
|
||||
The systeminfo package a custom fork of the [gopsutil](https://github.com/shirou/gopsutil) library to collect system metrics and integrates with the `period` package for time-bucketed storage. It supports collecting CPU, memory, disk, network, and sensor data with configurable collection intervals and aggregation modes for visualization.
|
||||
|
||||
### Primary Consumers
|
||||
|
||||
- `internal/api/v1/metrics` - HTTP endpoint for system metrics
|
||||
- `internal/homepage` - Dashboard system monitoring widgets
|
||||
- Monitoring and alerting systems
|
||||
|
||||
### Non-goals
|
||||
|
||||
- Does not provide alerting on metric thresholds
|
||||
- Does not persist metrics beyond the period package retention
|
||||
- Does not provide data aggregation across multiple instances
|
||||
- Does not support custom metric collectors
|
||||
|
||||
### Stability
|
||||
|
||||
Internal package. Data format and API are stable.
|
||||
|
||||
## Public API
|
||||
|
||||
### Exported Types
|
||||
|
||||
#### SystemInfo Struct
|
||||
|
||||
```go
|
||||
type SystemInfo struct {
|
||||
Timestamp int64 `json:"timestamp"`
|
||||
CPUAverage *float64 `json:"cpu_average"`
|
||||
Memory mem.VirtualMemoryStat `json:"memory"`
|
||||
Disks map[string]disk.UsageStat `json:"disks"`
|
||||
DisksIO map[string]*disk.IOCountersStat `json:"disks_io"`
|
||||
Network net.IOCountersStat `json:"network"`
|
||||
Sensors Sensors `json:"sensors"`
|
||||
}
|
||||
```
|
||||
|
||||
Container for all system metrics at a point in time.
|
||||
|
||||
**Fields:**
|
||||
|
||||
- `Timestamp` - Unix timestamp of collection
|
||||
- `CPUAverage` - Average CPU usage percentage (0-100)
|
||||
- `Memory` - Virtual memory statistics (used, total, percent, etc.)
|
||||
- `Disks` - Disk usage by partition mountpoint
|
||||
- `DisksIO` - Disk I/O counters by device name
|
||||
- `Network` - Network I/O counters for primary interface
|
||||
- `Sensors` - Hardware temperature sensor readings
|
||||
|
||||
#### Sensors Type
|
||||
|
||||
```go
|
||||
type Sensors []sensors.TemperatureStat
|
||||
```
|
||||
|
||||
Slice of temperature sensor readings.
|
||||
|
||||
#### Aggregated Type
|
||||
|
||||
```go
|
||||
type Aggregated []map[string]any
|
||||
```
|
||||
|
||||
Aggregated data suitable for charting libraries like Recharts. Each entry is a map with timestamp and values.
|
||||
|
||||
#### SystemInfoAggregateMode Type
|
||||
|
||||
```go
|
||||
type SystemInfoAggregateMode string
|
||||
```
|
||||
|
||||
Aggregation mode constants:
|
||||
|
||||
```go
|
||||
const (
|
||||
SystemInfoAggregateModeCPUAverage SystemInfoAggregateMode = "cpu_average"
|
||||
SystemInfoAggregateModeMemoryUsage SystemInfoAggregateMode = "memory_usage"
|
||||
SystemInfoAggregateModeMemoryUsagePercent SystemInfoAggregateMode = "memory_usage_percent"
|
||||
SystemInfoAggregateModeDisksReadSpeed SystemInfoAggregateMode = "disks_read_speed"
|
||||
SystemInfoAggregateModeDisksWriteSpeed SystemInfoAggregateMode = "disks_write_speed"
|
||||
SystemInfoAggregateModeDisksIOPS SystemInfoAggregateMode = "disks_iops"
|
||||
SystemInfoAggregateModeDiskUsage SystemInfoAggregateMode = "disk_usage"
|
||||
SystemInfoAggregateModeNetworkSpeed SystemInfoAggregateMode = "network_speed"
|
||||
SystemInfoAggregateModeNetworkTransfer SystemInfoAggregateMode = "network_transfer"
|
||||
SystemInfoAggregateModeSensorTemperature SystemInfoAggregateMode = "sensor_temperature"
|
||||
)
|
||||
```
|
||||
|
||||
### Exported Variables
|
||||
|
||||
#### Poller
|
||||
|
||||
```go
|
||||
var Poller = period.NewPoller("system_info", getSystemInfo, aggregate)
|
||||
```
|
||||
|
||||
Pre-configured poller for system info metrics. Start with `Poller.Start()`.
|
||||
|
||||
### Exported Functions
|
||||
|
||||
#### getSystemInfo
|
||||
|
||||
```go
|
||||
func getSystemInfo(ctx context.Context, lastResult *SystemInfo) (*SystemInfo, error)
|
||||
```
|
||||
|
||||
Collects current system metrics. This is the poll function passed to the period poller.
|
||||
|
||||
**Features:**
|
||||
|
||||
- Concurrent collection of all metric categories
|
||||
- Handles partial failures gracefully
|
||||
- Calculates rates based on previous result (for speed metrics)
|
||||
- Logs warnings for non-critical errors
|
||||
|
||||
**Rate Calculations:**
|
||||
|
||||
- Disk read/write speed: `(currentBytes - lastBytes) / interval`
|
||||
- Disk IOPS: `(currentCount - lastCount) / interval`
|
||||
- Network speed: `(currentBytes - lastBytes) / interval`
|
||||
|
||||
#### aggregate
|
||||
|
||||
```go
|
||||
func aggregate(entries []*SystemInfo, query url.Values) (total int, result Aggregated)
|
||||
```
|
||||
|
||||
Aggregates system info entries for a specific mode. Called by the period poller.
|
||||
|
||||
**Query Parameters:**
|
||||
|
||||
- `aggregate` - The aggregation mode (see constants above)
|
||||
|
||||
**Returns:**
|
||||
|
||||
- `total` - Number of aggregated entries
|
||||
- `result` - Slice of maps suitable for charting
|
||||
|
||||
## Architecture
|
||||
|
||||
### Core Components
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph Collection
|
||||
G[gopsutil] -->|CPU| CPU[CPU Percent]
|
||||
G -->|Memory| Mem[Virtual Memory]
|
||||
G -->|Disks| Disk[Partitions & IO]
|
||||
G -->|Network| Net[Network Counters]
|
||||
G -->|Sensors| Sens[Temperature]
|
||||
end
|
||||
|
||||
subgraph Poller
|
||||
Collect[getSystemInfo] -->|Aggregates| Info[SystemInfo]
|
||||
Info -->|Stores in| Period[Period SystemInfo]
|
||||
end
|
||||
|
||||
subgraph Aggregation Modes
|
||||
CPUAvg[cpu_average]
|
||||
MemUsage[memory_usage]
|
||||
MemPercent[memory_usage_percent]
|
||||
DiskRead[disks_read_speed]
|
||||
DiskWrite[disks_write_speed]
|
||||
DiskIOPS[disks_iops]
|
||||
DiskUsage[disk_usage]
|
||||
NetSpeed[network_speed]
|
||||
NetTransfer[network_transfer]
|
||||
SensorTemp[sensor_temperature]
|
||||
end
|
||||
|
||||
Period -->|Query with| Aggregate[aggregate function]
|
||||
Aggregate --> CPUAvg
|
||||
Aggregate --> MemUsage
|
||||
Aggregate --> DiskRead
|
||||
```
|
||||
|
||||
### Data Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant gopsutil
|
||||
participant Poller
|
||||
participant Period
|
||||
participant API
|
||||
|
||||
Poller->>Poller: Start background goroutine
|
||||
|
||||
loop Every 1 second
|
||||
Poller->>gopsutil: Collect CPU (500ms timeout)
|
||||
Poller->>gopsutil: Collect Memory
|
||||
Poller->>gopsutil: Collect Disks (partition + IO)
|
||||
Poller->>gopsutil: Collect Network
|
||||
Poller->>gopsutil: Collect Sensors
|
||||
|
||||
gopsutil-->>Poller: SystemInfo
|
||||
Poller->>Period: Add(SystemInfo)
|
||||
end
|
||||
|
||||
API->>Period: Get(filter)
|
||||
Period-->>API: Entries
|
||||
API->>API: aggregate(entries, mode)
|
||||
API-->>Client: Chart data
|
||||
```
|
||||
|
||||
### Collection Categories
|
||||
|
||||
| Category | Data Source | Optional | Rate Metrics |
|
||||
| -------- | ------------------------------------------------------ | -------- | --------------------- |
|
||||
| CPU | `cpu.PercentWithContext` | Yes | No |
|
||||
| Memory | `mem.VirtualMemoryWithContext` | Yes | No |
|
||||
| Disks | `disk.PartitionsWithContext` + `disk.UsageWithContext` | Yes | Yes (read/write/IOPS) |
|
||||
| Network | `net.IOCountersWithContext` | Yes | Yes (upload/download) |
|
||||
| Sensors | `sensors.TemperaturesWithContext` | Yes | No |
|
||||
|
||||
### Aggregation Modes
|
||||
|
||||
Each mode produces chart-friendly output:
|
||||
|
||||
**CPU Average:**
|
||||
|
||||
```json
|
||||
[
|
||||
{ "timestamp": 1704892800, "cpu_average": 45.5 },
|
||||
{ "timestamp": 1704892810, "cpu_average": 52.3 }
|
||||
]
|
||||
```
|
||||
|
||||
**Memory Usage:**
|
||||
|
||||
```json
|
||||
[
|
||||
{ "timestamp": 1704892800, "memory_usage": 8388608000 },
|
||||
{ "timestamp": 1704892810, "memory_usage": 8453440000 }
|
||||
]
|
||||
```
|
||||
|
||||
**Disk Read/Write Speed:**
|
||||
|
||||
```json
|
||||
[
|
||||
{ "timestamp": 1704892800, "sda": 10485760, "sdb": 5242880 },
|
||||
{ "timestamp": 1704892810, "sda": 15728640, "sdb": 4194304 }
|
||||
]
|
||||
```
|
||||
|
||||
## Configuration Surface
|
||||
|
||||
### Disabling Metrics Categories
|
||||
|
||||
Metrics categories can be disabled via environment variables:
|
||||
|
||||
| Variable | Purpose |
|
||||
| ------------------------- | ------------------------------------------- |
|
||||
| `METRICS_DISABLE_CPU` | Set to "true" to disable CPU collection |
|
||||
| `METRICS_DISABLE_MEMORY` | Set to "true" to disable memory collection |
|
||||
| `METRICS_DISABLE_DISK` | Set to "true" to disable disk collection |
|
||||
| `METRICS_DISABLE_NETWORK` | Set to "true" to disable network collection |
|
||||
| `METRICS_DISABLE_SENSORS` | Set to "true" to disable sensor collection |
|
||||
|
||||
## Dependency and Integration Map
|
||||
|
||||
### Internal Dependencies
|
||||
|
||||
| Package | Purpose |
|
||||
| -------------------------------- | --------------------- |
|
||||
| `internal/metrics/period` | Time-bucketed storage |
|
||||
| `internal/common` | Configuration flags |
|
||||
| `github.com/yusing/goutils/errs` | Error handling |
|
||||
|
||||
### External Dependencies
|
||||
|
||||
| Dependency | Purpose |
|
||||
| ------------------------------- | ------------------------- |
|
||||
| `github.com/shirou/gopsutil/v4` | System metrics collection |
|
||||
| `github.com/rs/zerolog` | Logging |
|
||||
|
||||
### Integration Points
|
||||
|
||||
- gopsutil provides raw system metrics
|
||||
- period package handles storage and persistence
|
||||
- HTTP API provides query interface
|
||||
|
||||
## Observability
|
||||
|
||||
### Logs
|
||||
|
||||
| Level | When |
|
||||
| ----- | ------------------------------------------ |
|
||||
| Warn | Non-critical errors (e.g., no sensor data) |
|
||||
| Error | Other errors |
|
||||
|
||||
### Metrics
|
||||
|
||||
No metrics exposed directly. Collection errors are logged.
|
||||
|
||||
## Failure Modes and Recovery
|
||||
|
||||
| Failure | Detection | Recovery |
|
||||
| --------------- | ------------------------------------ | -------------------------------- |
|
||||
| No CPU data | `cpu.Percent` returns error | Skip and log later with warning |
|
||||
| No memory data | `mem.VirtualMemory` returns error | Skip and log later with warning |
|
||||
| No disk data | `disk.Usage` returns error for all | Skip and log later with warning |
|
||||
| No network data | `net.IOCounters` returns error | Skip and log later with warning |
|
||||
| No sensor data | `sensors.Temperatures` returns error | Skip and log later with warning |
|
||||
| Context timeout | Context deadline exceeded | Return partial data with warning |
|
||||
|
||||
### Partial Collection
|
||||
|
||||
The package uses `gperr.NewGroup` to collect errors from concurrent operations:
|
||||
|
||||
```go
|
||||
errs := gperr.NewGroup("failed to get system info")
|
||||
errs.Go(func() error { return s.collectCPUInfo(ctx) })
|
||||
errs.Go(func() error { return s.collectMemoryInfo(ctx) })
|
||||
// ...
|
||||
result := errs.Wait()
|
||||
```
|
||||
|
||||
Warnings (like `ENODATA`) are logged but don't fail the collection.
|
||||
Critical errors cause the function to return an error.
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Starting the Poller
|
||||
|
||||
```go
|
||||
import "github.com/yusing/godoxy/internal/metrics/systeminfo"
|
||||
|
||||
func init() {
|
||||
systeminfo.Poller.Start()
|
||||
}
|
||||
```
|
||||
|
||||
### HTTP Endpoint
|
||||
|
||||
```go
|
||||
import "github.com/gin-gonic/gin"
|
||||
|
||||
func setupMetricsAPI(r *gin.Engine) {
|
||||
r.GET("/api/metrics/system", systeminfo.Poller.ServeHTTP)
|
||||
}
|
||||
```
|
||||
|
||||
**API Examples:**
|
||||
|
||||
```bash
|
||||
# Get latest metrics
|
||||
curl http://localhost:8080/api/metrics/system
|
||||
|
||||
# Get 1-hour history with CPU aggregation
|
||||
curl "http://localhost:8080/api/metrics/system?period=1h&aggregate=cpu_average"
|
||||
|
||||
# Get 24-hour memory usage history
|
||||
curl "http://localhost:8080/api/metrics/system?period=1d&aggregate=memory_usage_percent"
|
||||
|
||||
# Get disk I/O for the last hour
|
||||
curl "http://localhost:8080/api/metrics/system?period=1h&aggregate=disks_read_speed"
|
||||
```
|
||||
|
||||
### WebSocket Streaming
|
||||
|
||||
```javascript
|
||||
const ws = new WebSocket(
|
||||
"ws://localhost:8080/api/metrics/system?period=1m&interval=5s&aggregate=cpu_average"
|
||||
);
|
||||
|
||||
ws.onmessage = (event) => {
|
||||
const data = JSON.parse(event.data);
|
||||
console.log("CPU:", data.data);
|
||||
};
|
||||
```
|
||||
|
||||
### Direct Data Access
|
||||
|
||||
```go
|
||||
// Get entries for the last hour
|
||||
entries, ok := systeminfo.Poller.Get(period.MetricsPeriod1h)
|
||||
for _, entry := range entries {
|
||||
if entry.CPUAverage != nil {
|
||||
fmt.Printf("CPU: %.1f%% at %d\n", *entry.CPUAverage, entry.Timestamp)
|
||||
}
|
||||
}
|
||||
|
||||
// Get the most recent metrics
|
||||
latest := systeminfo.Poller.GetLastResult()
|
||||
```
|
||||
|
||||
### Disabling Metrics at Runtime
|
||||
|
||||
```go
|
||||
import (
|
||||
"github.com/yusing/godoxy/internal/common"
|
||||
"github.com/yusing/godoxy/internal/metrics/systeminfo"
|
||||
)
|
||||
|
||||
func init() {
|
||||
// Disable expensive sensor collection
|
||||
common.MetricsDisableSensors = true
|
||||
systeminfo.Poller.Start()
|
||||
}
|
||||
```
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- O(1) per metric collection (gopsutil handles complexity)
|
||||
- Concurrent collection of all categories
|
||||
- Rate calculations O(n) where n = number of disks/interfaces
|
||||
- Memory: O(5 _ 100 _ sizeof(SystemInfo))
|
||||
- JSON serialization O(n) for API responses
|
||||
|
||||
### Collection Latency
|
||||
|
||||
| Category | Typical Latency |
|
||||
| -------- | -------------------------------------- |
|
||||
| CPU | ~10-50ms |
|
||||
| Memory | ~5-10ms |
|
||||
| Disks | ~10-100ms (depends on partition count) |
|
||||
| Network | ~5-10ms |
|
||||
| Sensors | ~10-50ms |
|
||||
|
||||
## Testing Notes
|
||||
|
||||
- Mock gopsutil calls for unit tests
|
||||
- Test with real metrics to verify rate calculations
|
||||
- Test aggregation modes with various data sets
|
||||
- Verify disable flags work correctly
|
||||
- Test partial failure scenarios
|
||||
|
||||
## Related Packages
|
||||
|
||||
- `internal/metrics/period` - Time-bucketed storage
|
||||
- `internal/api/v1/metrics` - HTTP API endpoints
|
||||
- `github.com/shirou/gopsutil/v4` - System metrics library
|
||||
402
internal/metrics/uptime/README.md
Normal file
402
internal/metrics/uptime/README.md
Normal file
@@ -0,0 +1,402 @@
|
||||
# Uptime
|
||||
|
||||
Tracks and aggregates route health status over time, providing uptime/downtime statistics and latency metrics.
|
||||
|
||||
## Overview
|
||||
|
||||
The uptime package monitors route health status and calculates uptime percentages over configurable time periods. It integrates with the `period` package for historical storage and provides aggregated statistics for visualization.
|
||||
|
||||
### Primary Consumers
|
||||
|
||||
- `internal/api/v1/metrics` - HTTP endpoint for uptime data
|
||||
- `internal/homepage` - Dashboard uptime widgets
|
||||
- Monitoring and alerting systems
|
||||
|
||||
### Non-goals
|
||||
|
||||
- Does not perform health checks (handled by `internal/route/routes`)
|
||||
- Does not provide alerting on downtime
|
||||
- Does not persist data beyond the period package retention
|
||||
- Does not aggregate across multiple GoDoxy instances
|
||||
|
||||
### Stability
|
||||
|
||||
Internal package. Data format and API are stable.
|
||||
|
||||
## Public API
|
||||
|
||||
### Exported Types
|
||||
|
||||
#### StatusByAlias
|
||||
|
||||
```go
|
||||
type StatusByAlias struct {
|
||||
Map map[string]routes.HealthInfoWithoutDetail `json:"statuses"`
|
||||
Timestamp int64 `json:"timestamp"`
|
||||
}
|
||||
```
|
||||
|
||||
Container for health status of all routes at a specific time.
|
||||
|
||||
#### Status
|
||||
|
||||
```go
|
||||
type Status struct {
|
||||
Status types.HealthStatus `json:"status" swaggertype:"string" enums:"healthy,unhealthy,unknown,napping,starting"`
|
||||
Latency int32 `json:"latency"`
|
||||
Timestamp int64 `json:"timestamp"`
|
||||
}
|
||||
```
|
||||
|
||||
Individual route status at a point in time.
|
||||
|
||||
#### RouteAggregate
|
||||
|
||||
```go
|
||||
type RouteAggregate struct {
|
||||
Alias string `json:"alias"`
|
||||
DisplayName string `json:"display_name"`
|
||||
Uptime float32 `json:"uptime"`
|
||||
Downtime float32 `json:"downtime"`
|
||||
Idle float32 `json:"idle"`
|
||||
AvgLatency float32 `json:"avg_latency"`
|
||||
IsDocker bool `json:"is_docker"`
|
||||
IsExcluded bool `json:"is_excluded"`
|
||||
CurrentStatus types.HealthStatus `json:"current_status" swaggertype:"string" enums:"healthy,unhealthy,unknown,napping,starting"`
|
||||
Statuses []Status `json:"statuses"`
|
||||
}
|
||||
```
|
||||
|
||||
Aggregated statistics for a single route.
|
||||
|
||||
#### Aggregated
|
||||
|
||||
```go
|
||||
type Aggregated []RouteAggregate
|
||||
```
|
||||
|
||||
Slice of route aggregates, sorted alphabetically by alias.
|
||||
|
||||
### Exported Variables
|
||||
|
||||
#### Poller
|
||||
|
||||
```go
|
||||
var Poller = period.NewPoller("uptime", getStatuses, aggregateStatuses)
|
||||
```
|
||||
|
||||
Pre-configured poller for uptime metrics. Start with `Poller.Start()`.
|
||||
|
||||
### Unexported Functions
|
||||
|
||||
#### getStatuses
|
||||
|
||||
```go
|
||||
func getStatuses(ctx context.Context, _ StatusByAlias) (StatusByAlias, error)
|
||||
```
|
||||
|
||||
Collects current status of all routes. Called by the period poller every second.
|
||||
|
||||
**Returns:**
|
||||
|
||||
- `StatusByAlias` - Map of all route statuses with current timestamp
|
||||
- `error` - Always nil (errors are logged internally)
|
||||
|
||||
#### aggregateStatuses
|
||||
|
||||
```go
|
||||
func aggregateStatuses(entries []StatusByAlias, query url.Values) (int, Aggregated)
|
||||
```
|
||||
|
||||
Aggregates status entries into route statistics.
|
||||
|
||||
**Query Parameters:**
|
||||
|
||||
- `period` - Time filter (5m, 15m, 1h, 1d, 1mo)
|
||||
- `limit` - Maximum number of routes to return (0 = all)
|
||||
- `offset` - Offset for pagination
|
||||
- `keyword` - Fuzzy search keyword for filtering routes
|
||||
|
||||
**Returns:**
|
||||
|
||||
- `int` - Total number of routes matching the query
|
||||
- `Aggregated` - Slice of route aggregates
|
||||
|
||||
## Architecture
|
||||
|
||||
### Core Components
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph Health Monitoring
|
||||
Routes[Routes] -->|GetHealthInfoWithoutDetail| Status[Status Map]
|
||||
Status -->|Polls every| Second[1 Second]
|
||||
end
|
||||
|
||||
subgraph Poller
|
||||
Poll[getStatuses] -->|Collects| StatusByAlias
|
||||
StatusByAlias -->|Stores in| Period[Period StatusByAlias]
|
||||
end
|
||||
|
||||
subgraph Aggregation
|
||||
Query[Query Params] -->|Filters| Aggregate[aggregateStatuses]
|
||||
Aggregate -->|Calculates| RouteAggregate
|
||||
RouteAggregate -->|Uptime| UP[Uptime %]
|
||||
RouteAggregate -->|Downtime| DOWN[Downtime %]
|
||||
RouteAggregate -->|Idle| IDLE[Idle %]
|
||||
RouteAggregate -->|Latency| LAT[Avg Latency]
|
||||
end
|
||||
|
||||
subgraph Response
|
||||
RouteAggregate -->|JSON| Client[API Client]
|
||||
end
|
||||
```
|
||||
|
||||
### Data Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Routes as Route Registry
|
||||
participant Poller as Uptime Poller
|
||||
participant Period as Period Storage
|
||||
participant API as HTTP API
|
||||
|
||||
Routes->>Poller: GetHealthInfoWithoutDetail()
|
||||
Poller->>Period: Add(StatusByAlias)
|
||||
|
||||
loop Every second
|
||||
Poller->>Routes: Collect status
|
||||
Poller->>Period: Store status
|
||||
end
|
||||
|
||||
API->>Period: Get(filter)
|
||||
Period-->>API: Entries
|
||||
API->>API: aggregateStatuses()
|
||||
API-->>Client: Aggregated JSON
|
||||
```
|
||||
|
||||
### Status Types
|
||||
|
||||
| Status | Description | Counted as Uptime? |
|
||||
| ----------- | ------------------------------ | ------------------ |
|
||||
| `healthy` | Route is responding normally | Yes |
|
||||
| `unhealthy` | Route is not responding | No |
|
||||
| `unknown` | Status could not be determined | Excluded |
|
||||
| `napping` | Route is in idle/sleep state | Idle (separate) |
|
||||
| `starting` | Route is starting up | Idle (separate) |
|
||||
|
||||
### Calculation Formula
|
||||
|
||||
For a set of status entries:
|
||||
|
||||
```
|
||||
Uptime = healthy_count / total_count
|
||||
Downtime = unhealthy_count / total_count
|
||||
Idle = (napping_count + starting_count) / total_count
|
||||
AvgLatency = sum(latency) / count
|
||||
```
|
||||
|
||||
Note: `unknown` statuses are excluded from all calculations.
|
||||
|
||||
## Configuration Surface
|
||||
|
||||
No explicit configuration. The poller uses period package defaults:
|
||||
|
||||
| Parameter | Value |
|
||||
| ------------- | ---------------------------- |
|
||||
| Poll Interval | 1 second |
|
||||
| Retention | 5m, 15m, 1h, 1d, 1mo periods |
|
||||
|
||||
## Dependency and Integration Map
|
||||
|
||||
### Internal Dependencies
|
||||
|
||||
| Package | Purpose |
|
||||
| ------------------------- | --------------------- |
|
||||
| `internal/route/routes` | Health info retrieval |
|
||||
| `internal/metrics/period` | Time-bucketed storage |
|
||||
| `internal/types` | HealthStatus enum |
|
||||
| `internal/metrics/utils` | Query utilities |
|
||||
|
||||
### External Dependencies
|
||||
|
||||
| Dependency | Purpose |
|
||||
| ---------------------------------------- | ---------------- |
|
||||
| `github.com/lithammer/fuzzysearch/fuzzy` | Keyword matching |
|
||||
| `github.com/bytedance/sonic` | JSON marshaling |
|
||||
|
||||
### Integration Points
|
||||
|
||||
- Route health monitors provide status via `routes.GetHealthInfoWithoutDetail()`
|
||||
- Period poller handles data collection and storage
|
||||
- HTTP API provides query interface via `Poller.ServeHTTP`
|
||||
|
||||
## Observability
|
||||
|
||||
### Logs
|
||||
|
||||
Poller lifecycle and errors are logged via zerolog.
|
||||
|
||||
### Metrics
|
||||
|
||||
No metrics exposed directly. Status data available via API.
|
||||
|
||||
## Failure Modes and Recovery
|
||||
|
||||
| Failure | Detection | Recovery |
|
||||
| -------------------------------- | --------------------------------- | ------------------------------ |
|
||||
| Route health monitor unavailable | Empty map returned | Log warning, continue |
|
||||
| Invalid query parameters | `aggregateStatuses` returns empty | Return empty result |
|
||||
| Poller panic | Goroutine crash | Process terminates |
|
||||
| Persistence failure | Load/save error | Log, continue with empty state |
|
||||
|
||||
### Fuzzy Search
|
||||
|
||||
The package uses `fuzzy.MatchFold` for keyword matching:
|
||||
|
||||
- Case-insensitive matching
|
||||
- Substring matching
|
||||
- Fuzzy ranking
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Starting the Poller
|
||||
|
||||
```go
|
||||
import "github.com/yusing/godoxy/internal/metrics/uptime"
|
||||
|
||||
func init() {
|
||||
uptime.Poller.Start()
|
||||
}
|
||||
```
|
||||
|
||||
### HTTP Endpoint
|
||||
|
||||
```go
|
||||
import (
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/yusing/godoxy/internal/metrics/uptime"
|
||||
)
|
||||
|
||||
func setupUptimeAPI(r *gin.Engine) {
|
||||
r.GET("/api/uptime", uptime.Poller.ServeHTTP)
|
||||
}
|
||||
```
|
||||
|
||||
**API Examples:**
|
||||
|
||||
```bash
|
||||
# Get latest status
|
||||
curl http://localhost:8080/api/uptime
|
||||
|
||||
# Get 1-hour history
|
||||
curl "http://localhost:8080/api/uptime?period=1h"
|
||||
|
||||
# Get with limit and offset (pagination)
|
||||
curl "http://localhost:8080/api/uptime?limit=10&offset=0"
|
||||
|
||||
# Search for routes containing "api"
|
||||
curl "http://localhost:8080/api/uptime?keyword=api"
|
||||
|
||||
# Combined query
|
||||
curl "http://localhost:8080/api/uptime?period=1d&limit=20&offset=0&keyword=docker"
|
||||
```
|
||||
|
||||
### WebSocket Streaming
|
||||
|
||||
```javascript
|
||||
const ws = new WebSocket(
|
||||
"ws://localhost:8080/api/uptime?period=1m&interval=5s"
|
||||
);
|
||||
|
||||
ws.onmessage = (event) => {
|
||||
const data = JSON.parse(event.data);
|
||||
data.data.forEach((route) => {
|
||||
console.log(`${route.display_name}: ${route.uptime * 100}% uptime`);
|
||||
});
|
||||
};
|
||||
```
|
||||
|
||||
### Direct Data Access
|
||||
|
||||
```go
|
||||
// Get entries for the last hour
|
||||
entries, ok := uptime.Poller.Get(period.MetricsPeriod1h)
|
||||
for _, entry := range entries {
|
||||
for alias, status := range entry.Map {
|
||||
fmt.Printf("Route %s: %s (latency: %dms)\n",
|
||||
alias, status.Status, status.Latency.Milliseconds())
|
||||
}
|
||||
}
|
||||
|
||||
// Get aggregated statistics
|
||||
_, agg := uptime.aggregateStatuses(entries, url.Values{
|
||||
"period": []string{"1h"},
|
||||
})
|
||||
|
||||
for _, route := range agg {
|
||||
fmt.Printf("%s: %.1f%% uptime, %.1fms avg latency\n",
|
||||
route.DisplayName, route.Uptime*100, route.AvgLatency)
|
||||
}
|
||||
```
|
||||
|
||||
### Response Format
|
||||
|
||||
**Latest Status Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"alias1": {
|
||||
"status": "healthy",
|
||||
"latency": 45
|
||||
},
|
||||
"alias2": {
|
||||
"status": "unhealthy",
|
||||
"latency": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Aggregated Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"total": 5,
|
||||
"data": [
|
||||
{
|
||||
"alias": "api-server",
|
||||
"display_name": "API Server",
|
||||
"uptime": 0.98,
|
||||
"downtime": 0.02,
|
||||
"idle": 0.0,
|
||||
"avg_latency": 45.5,
|
||||
"is_docker": true,
|
||||
"is_excluded": false,
|
||||
"current_status": "healthy",
|
||||
"statuses": [
|
||||
{ "status": "healthy", "latency": 45, "timestamp": 1704892800 }
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- O(n) status collection per poll where n = number of routes
|
||||
- O(m \* k) aggregation where m = entries, k = routes
|
||||
- Memory: O(p _ r _ s) where p = periods, r = routes, s = status size
|
||||
- Fuzzy search is O(routes \* keyword_length)
|
||||
|
||||
## Testing Notes
|
||||
|
||||
- Mock `routes.GetHealthInfoWithoutDetail()` for testing
|
||||
- Test aggregation with known status sequences
|
||||
- Verify pagination and filtering logic
|
||||
- Test fuzzy search matching
|
||||
|
||||
## Related Packages
|
||||
|
||||
- `internal/route/routes` - Route health monitoring
|
||||
- `internal/metrics/period` - Time-bucketed metrics storage
|
||||
- `internal/types` - Health status types
|
||||
Reference in New Issue
Block a user