docs: add per package README for implementation details (AI generated with human review)

2026-04-24 09:18:31 +02:00 · 2026-01-08 23:39:19 +08:00
parent 2f2828ec48
commit 86b655be3c
54 changed files with 13825 additions and 124 deletions
--- a/internal/metrics/README.md
+++ b/internal/metrics/README.md
@@ -0,0 +1,118 @@
+# Metrics Package
+
+System monitoring and metrics collection for GoDoxy with time-series storage and REST/WebSocket APIs.
+
+## Overview
+
+This package provides a unified metrics collection system that:
+
+- Polls system and route data at regular intervals
+- Stores historical data across multiple time periods
+- Exposes both REST and WebSocket APIs for consumption
+
+### Primary Consumers
+
+- `internal/api/v1/metrics/` - REST API endpoints
+- WebUI - Real-time charts
+- `internal/metrics/uptime/` - Route health monitoring
+
+### Non-goals
+
+- Metric aggregation from external sources
+- Alerting (handled by `internal/notif/`)
+- Long-term storage (30-day retention only)
+
+### Stability
+
+Internal package. See `internal/metrics/period/README.md` for the core framework documentation.
+
+## Packages
+
+### `period/`
+
+Generic time-bucketed metrics storage framework with:
+
+- `Period[T]` - Multi-timeframe container
+- `Poller[T, A]` - Background data collector
+- `Entries[T]` - Circular buffer for time-series data
+
+See [period/README.md](./period/README.md) for full documentation.
+
+### `uptime/`
+
+Route health status monitoring using the period framework.
+
+### `systeminfo/`
+
+System metrics collection (CPU, memory, disk, network, sensors) using the period framework.
+
+## Architecture
+
+```mermaid
+graph TB
+    subgraph "Data Sources"
+        SI[SystemInfo Poller]
+        UP[Uptime Poller]
+    end
+
+    subgraph "Period Framework"
+        P[Period<T> Generic]
+        E[Entries<T> Ring Buffer]
+        PL[Poller<T, A> Orchestrator]
+        H[Handler HTTP API]
+    end
+
+    subgraph "Storage"
+        JSON[(data/metrics/*.json)]
+    end
+
+    P --> E
+    PL --> P
+    PL --> SI
+    PL --> UP
+    H --> PL
+    PL --> JSON
+```
+
+## Configuration Surface
+
+No explicit configuration. Pollers respect `common.MetricsDisable*` flags:
+
+| Flag                    | Disables                  |
+| ----------------------- | ------------------------- |
+| `MetricsDisableCPU`     | CPU percentage collection |
+| `MetricsDisableMemory`  | Memory statistics         |
+| `MetricsDisableDisk`    | Disk usage and I/O        |
+| `MetricsDisableNetwork` | Network counters          |
+| `MetricsDisableSensors` | Temperature sensors       |
+
+## Dependency and Integration Map
+
+### Internal Dependencies
+
+- `github.com/yusing/goutils/task` - Lifetime management
+- `internal/types` - Health check types
+
+### External Dependencies
+
+- `github.com/shirou/gopsutil/v4` - System metrics collection
+- `github.com/puzpuzpuz/xsync/v4` - Atomic value storage
+- `github.com/bytedance/sonic` - JSON serialization
+
+## Observability
+
+### Logs
+
+| Level   | When                                        |
+| ------- | ------------------------------------------- |
+| `Debug` | Poller start, data load/save                |
+| `Error` | Data source failures (aggregated every 30s) |
+
+## Failure Modes and Recovery
+
+| Failure Mode              | Impact               | Recovery                         |
+| ------------------------- | -------------------- | -------------------------------- |
+| Data source timeout       | Missing data point   | Logged, aggregated, continues    |
+| Disk read failure         | No historical data   | Starts fresh, warns              |
+| Disk write failure        | Data loss on restart | Continues, retries next interval |
+| Memory allocation failure | OOM risk             | Go runtime handles               |
--- a/internal/metrics/period/README.md
+++ b/internal/metrics/period/README.md
@@ -0,0 +1,470 @@
+# Period Metrics
+
+Provides time-bucketed metrics storage with configurable periods, enabling historical data aggregation and real-time streaming.
+
+## Overview
+
+The period package implements a generic metrics collection system with time-bucketed storage. It collects data points at regular intervals and stores them in predefined time windows (5m, 15m, 1h, 1d, 1mo) with automatic persistence and HTTP/WebSocket APIs.
+
+### Primary Consumers
+
+- `internal/metrics/uptime` - Route health status storage
+- `internal/metrics/systeminfo` - System metrics storage
+- `internal/api/v1/metrics` - HTTP API endpoints
+
+### Non-goals
+
+- Does not provide data visualization
+- Does not implement alerting or anomaly detection
+- Does not support custom time periods (fixed set only)
+- Does not provide data aggregation across multiple instances
+
+### Stability
+
+Internal package. Public interfaces are stable.
+
+## Public API
+
+### Exported Types
+
+#### Period[T] Struct
+
+```go
+type Period[T any] struct {
+    Entries map[Filter]*Entries[T]
+    mu      sync.RWMutex
+}
+```
+
+Container for all time-bucketed entries. Maps each filter to its corresponding `Entries`.
+
+**Methods:**
+
+- `Add(info T)` - Adds a data point to all periods
+- `Get(filter Filter) ([]T, bool)` - Gets entries for a specific period
+- `Total() int` - Returns total number of entries across all periods
+- `ValidateAndFixIntervals()` - Validates and fixes intervals after loading
+
+#### Entries[T] Struct
+
+```go
+type Entries[T any] struct {
+    entries  [maxEntries]T
+    index    int
+    count    int
+    interval time.Duration
+    lastAdd  time.Time
+}
+```
+
+Circular buffer holding up to 100 entries for a single time period.
+
+**Methods:**
+
+- `Add(now time.Time, info T)` - Adds an entry with interval checking
+- `Get() []T` - Returns all entries in chronological order
+
+#### Filter Type
+
+```go
+type Filter string
+```
+
+Time period filter.
+
+```go
+const (
+    MetricsPeriod5m  Filter = "5m"
+    MetricsPeriod15m Filter = "15m"
+    MetricsPeriod1h  Filter = "1h"
+    MetricsPeriod1d  Filter = "1d"
+    MetricsPeriod1mo Filter = "1mo"
+)
+```
+
+#### Poller[T, A] Struct
+
+```go
+type Poller[T any, A any] struct {
+    name         string
+    poll         PollFunc[T]
+    aggregate    AggregateFunc[T, A]
+    resultFilter FilterFunc[T]
+    period       *Period[T]
+    lastResult   synk.Value[T]
+    errs         []pollErr
+}
+```
+
+Generic poller that collects data at regular intervals.
+
+**Type Aliases:**
+
+```go
+type PollFunc[T any] func(ctx context.Context, lastResult T) (T, error)
+type AggregateFunc[T any, A any] func(entries []T, query url.Values) (total int, result A)
+type FilterFunc[T any] func(entries []T, keyword string) (filtered []T)
+```
+
+#### ResponseType[AggregateT]
+
+```go
+type ResponseType[AggregateT any] struct {
+    Total int        `json:"total"`
+    Data  AggregateT `json:"data"`
+}
+```
+
+Standard response format for API endpoints.
+
+### Exported Functions
+
+#### Period Constructors
+
+```go
+func NewPeriod[T any]() *Period[T]
+```
+
+Creates a new `Period[T]` with all time buckets initialized.
+
+#### Poller Constructors
+
+```go
+func NewPoller[T any, A any](
+    name string,
+    poll PollFunc[T],
+    aggregator AggregateFunc[T, A],
+) *Poller[T, A]
+```
+
+Creates a new poller with the specified name, poll function, and aggregator.
+
+```go
+func (p *Poller[T, A]) WithResultFilter(filter FilterFunc[T]) *Poller[T, A]
+```
+
+Adds a result filter to the poller for keyword-based filtering.
+
+#### Poller Methods
+
+```go
+func (p *Poller[T, A]) Get(filter Filter) ([]T, bool)
+```
+
+Gets entries for a specific time period.
+
+```go
+func (p *Poller[T, A]) GetLastResult() T
+```
+
+Gets the most recently collected data point.
+
+```go
+func (p *Poller[T, A]) Start()
+```
+
+Starts the poller. Launches a background goroutine that:
+
+1. Polls for data at 1-second intervals
+1. Stores data in all time buckets
+1. Saves data to disk every 5 minutes
+1. Reports errors every 30 seconds
+
+```go
+func (p *Poller[T, A]) ServeHTTP(c *gin.Context)
+```
+
+HTTP handler for data retrieval.
+
+## Architecture
+
+### Core Components
+
+```mermaid
+flowchart TD
+    subgraph Poller
+        Poll[PollFunc] -->|Collects| Data[Data Point T]
+        Data -->|Adds to| Period[Period T]
+        Period -->|Stores in| Buckets[Time Buckets]
+    end
+
+    subgraph Time Buckets
+        Bucket5m[5m Bucket] -->|Holds| Entries5m[100 Entries]
+        Bucket15m[15m Bucket] -->|Holds| Entries15m[100 Entries]
+        Bucket1h[1h Bucket] -->|Holds| Entries1h[100 Entries]
+        Bucket1d[1d Bucket] -->|Holds| Entries1d[100 Entries]
+        Bucket1mo[1mo Bucket] -->|Holds| Entries1mo[100 Entries]
+    end
+
+    subgraph API
+        Handler[ServeHTTP] -->|Queries| Period
+        Period -->|Returns| Aggregate[Aggregated Data]
+        WebSocket[WebSocket] -->|Streams| Periodic[Periodic Updates]
+    end
+
+    subgraph Persistence
+        Save[save] -->|Writes| File[JSON File]
+        File -->|Loads| Load[load]
+    end
+```
+
+### Data Flow
+
+```mermaid
+sequenceDiagram
+    participant Collector
+    participant Poller
+    participant Period
+    participant Entries as Time Bucket
+    participant Storage
+
+    Poller->>Poller: Start background goroutine
+
+    loop Every 1 second
+        Poller->>Collector: poll(ctx, lastResult)
+        Collector-->>Poller: data, error
+        Poller->>Period: Add(data)
+        Period->>Entries: Add(now, data)
+        Entries->>Entries: Circular buffer write
+
+        Poller->>Poller: Check save interval (every 5min)
+        alt Save interval reached
+            Poller->>Storage: Save to JSON
+        end
+
+        alt Error interval reached (30s)
+            Poller->>Poller: Gather and log errors
+        end
+    end
+```
+
+### Time Periods
+
+| Filter | Duration   | Interval     | Max Entries |
+| ------ | ---------- | ------------ | ----------- |
+| `5m`   | 5 minutes  | 3 seconds    | 100         |
+| `15m`  | 15 minutes | 9 seconds    | 100         |
+| `1h`   | 1 hour     | 36 seconds   | 100         |
+| `1d`   | 1 day      | 14.4 minutes | 100         |
+| `1mo`  | 30 days    | 7.2 hours    | 100         |
+
+### Circular Buffer Behavior
+
+```mermaid
+stateDiagram-v2
+    [*] --> Empty: NewEntries()
+    Empty --> Filling: Add(entry 1)
+    Filling --> Filling: Add(entry 2..N)
+    Filling --> Full: count == maxEntries
+    Full --> Overwrite: Add(new entry)
+    Overwrite --> Overwrite: index = (index + 1) % max
+```
+
+When full, new entries overwrite oldest entries (FIFO).
+
+## Configuration Surface
+
+### Poller Configuration
+
+| Parameter            | Type          | Default        | Description                |
+| -------------------- | ------------- | -------------- | -------------------------- |
+| `PollInterval`       | time.Duration | 1s             | How often to poll for data |
+| `saveInterval`       | time.Duration | 5m             | How often to save to disk  |
+| `gatherErrsInterval` | time.Duration | 30s            | Error aggregation interval |
+| `saveBaseDir`        | string        | `data/metrics` | Persistence directory      |
+
+### HTTP Query Parameters
+
+| Parameter          | Description                         |
+| ------------------ | ----------------------------------- |
+| `period`           | Time filter (5m, 15m, 1h, 1d, 1mo)  |
+| `aggregate`        | Aggregation mode (package-specific) |
+| `interval`         | WebSocket update interval           |
+| `limit` / `offset` | Pagination parameters               |
+
+## Dependency and Integration Map
+
+### Internal Dependencies
+
+None.
+
+### External Dependencies
+
+| Dependency                                 | Purpose                  |
+| ------------------------------------------ | ------------------------ |
+| `github.com/gin-gonic/gin`                 | HTTP handling            |
+| `github.com/yusing/goutils/http/websocket` | WebSocket streaming      |
+| `github.com/bytedance/sonic`               | JSON serialization       |
+| `github.com/yusing/goutils/task`           | Lifetime management      |
+| `github.com/puzpuzpuz/xsync/v4`            | Concurrent value storage |
+
+### Integration Points
+
+- Poll function collects data from external sources
+- Aggregate function transforms data for visualization
+- Filter function enables keyword-based filtering
+- HTTP handler provides REST/WebSocket endpoints
+
+## Observability
+
+### Logs
+
+| Level | When                                  |
+| ----- | ------------------------------------- |
+| Debug | Poller start/stop, buffer adjustments |
+| Error | Load/save failures                    |
+| Info  | Data loaded from disk                 |
+
+### Metrics
+
+None exposed directly. Poll errors are accumulated and logged periodically.
+
+## Security Considerations
+
+- HTTP endpoint should be protected via authentication
+- Data files contain potentially sensitive metrics
+- No input validation beyond basic query parsing
+- WebSocket connections have configurable intervals
+
+## Failure Modes and Recovery
+
+| Failure              | Detection              | Recovery                            |
+| -------------------- | ---------------------- | ----------------------------------- |
+| Poll function error  | `poll()` returns error | Error accumulated, logged every 30s |
+| JSON load failure    | `os.ReadFile` error    | Continue with empty period          |
+| JSON save failure    | `Encode` error         | Error accumulated, logged           |
+| Context cancellation | `<-ctx.Done()`         | Goroutine exits, final save         |
+| Disk full            | Write error            | Error logged, continue              |
+
+### Persistence Behavior
+
+1. On startup, attempts to load existing data from `data/metrics/{name}.json`
+1. If file doesn't exist, starts with empty data
+1. On load, validates and fixes intervals
+1. Saves every 5 minutes during operation
+1. Final save on goroutine exit
+
+## Usage Examples
+
+### Defining a Custom Poller
+
+```go
+import "github.com/yusing/godoxy/internal/metrics/period"
+
+type CustomMetric struct {
+    Timestamp int64   `json:"timestamp"`
+    Value     float64 `json:"value"`
+    Name      string  `json:"name"`
+}
+
+func pollCustomMetric(ctx context.Context, last CustomMetric) (CustomMetric, error) {
+    return CustomMetric{
+        Timestamp: time.Now().Unix(),
+        Value:     readSensorValue(),
+        Name:      "sensor_1",
+    }, nil
+}
+
+func aggregateCustomMetric(entries []CustomMetric, query url.Values) (int, Aggregated) {
+    // Aggregate logic here
+    return len(aggregated), aggregated
+}
+
+var CustomPoller = period.NewPoller("custom", pollCustomMetric, aggregateCustomMetric)
+```
+
+### Starting the Poller
+
+```go
+// In your main initialization
+CustomPoller.Start()
+```
+
+### Accessing Data
+
+```go
+// Get all entries from the last hour
+entries, ok := CustomPoller.Get(period.MetricsPeriod1h)
+if ok {
+    for _, entry := range entries {
+        fmt.Printf("Value: %.2f at %d\n", entry.Value, entry.Timestamp)
+    }
+}
+
+// Get the most recent value
+latest := CustomPoller.GetLastResult()
+```
+
+### HTTP Integration
+
+```go
+import "github.com/gin-gonic/gin"
+
+func setupMetricsAPI(r *gin.Engine) {
+    r.GET("/api/metrics/custom", CustomPoller.ServeHTTP)
+}
+```
+
+**API Examples:**
+
+```bash
+# Get last collected data
+GET /api/metrics/custom
+
+# Get 1-hour history
+GET /api/metrics/custom?period=1h
+
+# Get 1-day history with aggregation
+GET /api/metrics/custom?period=1d&aggregate=cpu_average
+```
+
+### WebSocket Integration
+
+```go
+// WebSocket connections automatically receive updates
+// at the specified interval
+ws, _, _ := websocket.DefaultDialer.Dial("ws://localhost/api/metrics/custom?interval=5s", nil)
+
+for {
+    _, msg, _ := ws.ReadMessage()
+    // Process the update
+}
+```
+
+### Data Persistence Format
+
+```json
+{
+  "entries": {
+    "5m": {
+      "entries": [...],
+      "interval": 3000000000
+    },
+    "15m": {...},
+    "1h": {...},
+    "1d": {...},
+    "1mo": {...}
+  }
+}
+```
+
+## Performance Characteristics
+
+- O(1) add to circular buffer
+- O(1) get (returns slice view)
+- O(n) serialization where n = total entries
+- Memory: O(5 * 100 * sizeof(T)) = fixed overhead
+- JSON load/save: O(n) where n = total entries
+
+## Testing Notes
+
+- Test circular buffer overflow behavior
+- Test interval validation after load
+- Test aggregation with various query parameters
+- Test concurrent access to period
+- Test error accumulation and reporting
+
+## Related Packages
+
+- `internal/metrics/uptime` - Uses period for health status
+- `internal/metrics/systeminfo` - Uses period for system metrics
--- a/internal/metrics/systeminfo/README.md
+++ b/internal/metrics/systeminfo/README.md
@@ -0,0 +1,439 @@
+# System Info
+
+Collects and aggregates system metrics including CPU, memory, disk, network, and sensor data with configurable aggregation modes.
+
+## Overview
+
+The systeminfo package a custom fork of the [gopsutil](https://github.com/shirou/gopsutil) library to collect system metrics and integrates with the `period` package for time-bucketed storage. It supports collecting CPU, memory, disk, network, and sensor data with configurable collection intervals and aggregation modes for visualization.
+
+### Primary Consumers
+
+- `internal/api/v1/metrics` - HTTP endpoint for system metrics
+- `internal/homepage` - Dashboard system monitoring widgets
+- Monitoring and alerting systems
+
+### Non-goals
+
+- Does not provide alerting on metric thresholds
+- Does not persist metrics beyond the period package retention
+- Does not provide data aggregation across multiple instances
+- Does not support custom metric collectors
+
+### Stability
+
+Internal package. Data format and API are stable.
+
+## Public API
+
+### Exported Types
+
+#### SystemInfo Struct
+
+```go
+type SystemInfo struct {
+    Timestamp  int64                           `json:"timestamp"`
+    CPUAverage *float64                        `json:"cpu_average"`
+    Memory     mem.VirtualMemoryStat           `json:"memory"`
+    Disks      map[string]disk.UsageStat       `json:"disks"`
+    DisksIO    map[string]*disk.IOCountersStat `json:"disks_io"`
+    Network    net.IOCountersStat              `json:"network"`
+    Sensors    Sensors                         `json:"sensors"`
+}
+```
+
+Container for all system metrics at a point in time.
+
+**Fields:**
+
+- `Timestamp` - Unix timestamp of collection
+- `CPUAverage` - Average CPU usage percentage (0-100)
+- `Memory` - Virtual memory statistics (used, total, percent, etc.)
+- `Disks` - Disk usage by partition mountpoint
+- `DisksIO` - Disk I/O counters by device name
+- `Network` - Network I/O counters for primary interface
+- `Sensors` - Hardware temperature sensor readings
+
+#### Sensors Type
+
+```go
+type Sensors []sensors.TemperatureStat
+```
+
+Slice of temperature sensor readings.
+
+#### Aggregated Type
+
+```go
+type Aggregated []map[string]any
+```
+
+Aggregated data suitable for charting libraries like Recharts. Each entry is a map with timestamp and values.
+
+#### SystemInfoAggregateMode Type
+
+```go
+type SystemInfoAggregateMode string
+```
+
+Aggregation mode constants:
+
+```go
+const (
+    SystemInfoAggregateModeCPUAverage         SystemInfoAggregateMode = "cpu_average"
+    SystemInfoAggregateModeMemoryUsage        SystemInfoAggregateMode = "memory_usage"
+    SystemInfoAggregateModeMemoryUsagePercent SystemInfoAggregateMode = "memory_usage_percent"
+    SystemInfoAggregateModeDisksReadSpeed     SystemInfoAggregateMode = "disks_read_speed"
+    SystemInfoAggregateModeDisksWriteSpeed    SystemInfoAggregateMode = "disks_write_speed"
+    SystemInfoAggregateModeDisksIOPS          SystemInfoAggregateMode = "disks_iops"
+    SystemInfoAggregateModeDiskUsage          SystemInfoAggregateMode = "disk_usage"
+    SystemInfoAggregateModeNetworkSpeed       SystemInfoAggregateMode = "network_speed"
+    SystemInfoAggregateModeNetworkTransfer    SystemInfoAggregateMode = "network_transfer"
+    SystemInfoAggregateModeSensorTemperature  SystemInfoAggregateMode = "sensor_temperature"
+)
+```
+
+### Exported Variables
+
+#### Poller
+
+```go
+var Poller = period.NewPoller("system_info", getSystemInfo, aggregate)
+```
+
+Pre-configured poller for system info metrics. Start with `Poller.Start()`.
+
+### Exported Functions
+
+#### getSystemInfo
+
+```go
+func getSystemInfo(ctx context.Context, lastResult *SystemInfo) (*SystemInfo, error)
+```
+
+Collects current system metrics. This is the poll function passed to the period poller.
+
+**Features:**
+
+- Concurrent collection of all metric categories
+- Handles partial failures gracefully
+- Calculates rates based on previous result (for speed metrics)
+- Logs warnings for non-critical errors
+
+**Rate Calculations:**
+
+- Disk read/write speed: `(currentBytes - lastBytes) / interval`
+- Disk IOPS: `(currentCount - lastCount) / interval`
+- Network speed: `(currentBytes - lastBytes) / interval`
+
+#### aggregate
+
+```go
+func aggregate(entries []*SystemInfo, query url.Values) (total int, result Aggregated)
+```
+
+Aggregates system info entries for a specific mode. Called by the period poller.
+
+**Query Parameters:**
+
+- `aggregate` - The aggregation mode (see constants above)
+
+**Returns:**
+
+- `total` - Number of aggregated entries
+- `result` - Slice of maps suitable for charting
+
+## Architecture
+
+### Core Components
+
+```mermaid
+flowchart TD
+    subgraph Collection
+        G[gopsutil] -->|CPU| CPU[CPU Percent]
+        G -->|Memory| Mem[Virtual Memory]
+        G -->|Disks| Disk[Partitions & IO]
+        G -->|Network| Net[Network Counters]
+        G -->|Sensors| Sens[Temperature]
+    end
+
+    subgraph Poller
+        Collect[getSystemInfo] -->|Aggregates| Info[SystemInfo]
+        Info -->|Stores in| Period[Period SystemInfo]
+    end
+
+    subgraph Aggregation Modes
+        CPUAvg[cpu_average]
+        MemUsage[memory_usage]
+        MemPercent[memory_usage_percent]
+        DiskRead[disks_read_speed]
+        DiskWrite[disks_write_speed]
+        DiskIOPS[disks_iops]
+        DiskUsage[disk_usage]
+        NetSpeed[network_speed]
+        NetTransfer[network_transfer]
+        SensorTemp[sensor_temperature]
+    end
+
+    Period -->|Query with| Aggregate[aggregate function]
+    Aggregate --> CPUAvg
+    Aggregate --> MemUsage
+    Aggregate --> DiskRead
+```
+
+### Data Flow
+
+```mermaid
+sequenceDiagram
+    participant gopsutil
+    participant Poller
+    participant Period
+    participant API
+
+    Poller->>Poller: Start background goroutine
+
+    loop Every 1 second
+        Poller->>gopsutil: Collect CPU (500ms timeout)
+        Poller->>gopsutil: Collect Memory
+        Poller->>gopsutil: Collect Disks (partition + IO)
+        Poller->>gopsutil: Collect Network
+        Poller->>gopsutil: Collect Sensors
+
+        gopsutil-->>Poller: SystemInfo
+        Poller->>Period: Add(SystemInfo)
+    end
+
+    API->>Period: Get(filter)
+    Period-->>API: Entries
+    API->>API: aggregate(entries, mode)
+    API-->>Client: Chart data
+```
+
+### Collection Categories
+
+| Category | Data Source                                            | Optional | Rate Metrics          |
+| -------- | ------------------------------------------------------ | -------- | --------------------- |
+| CPU      | `cpu.PercentWithContext`                               | Yes      | No                    |
+| Memory   | `mem.VirtualMemoryWithContext`                         | Yes      | No                    |
+| Disks    | `disk.PartitionsWithContext` + `disk.UsageWithContext` | Yes      | Yes (read/write/IOPS) |
+| Network  | `net.IOCountersWithContext`                            | Yes      | Yes (upload/download) |
+| Sensors  | `sensors.TemperaturesWithContext`                      | Yes      | No                    |
+
+### Aggregation Modes
+
+Each mode produces chart-friendly output:
+
+**CPU Average:**
+
+```json
+[
+  { "timestamp": 1704892800, "cpu_average": 45.5 },
+  { "timestamp": 1704892810, "cpu_average": 52.3 }
+]
+```
+
+**Memory Usage:**
+
+```json
+[
+  { "timestamp": 1704892800, "memory_usage": 8388608000 },
+  { "timestamp": 1704892810, "memory_usage": 8453440000 }
+]
+```
+
+**Disk Read/Write Speed:**
+
+```json
+[
+  { "timestamp": 1704892800, "sda": 10485760, "sdb": 5242880 },
+  { "timestamp": 1704892810, "sda": 15728640, "sdb": 4194304 }
+]
+```
+
+## Configuration Surface
+
+### Disabling Metrics Categories
+
+Metrics categories can be disabled via environment variables:
+
+| Variable                  | Purpose                                     |
+| ------------------------- | ------------------------------------------- |
+| `METRICS_DISABLE_CPU`     | Set to "true" to disable CPU collection     |
+| `METRICS_DISABLE_MEMORY`  | Set to "true" to disable memory collection  |
+| `METRICS_DISABLE_DISK`    | Set to "true" to disable disk collection    |
+| `METRICS_DISABLE_NETWORK` | Set to "true" to disable network collection |
+| `METRICS_DISABLE_SENSORS` | Set to "true" to disable sensor collection  |
+
+## Dependency and Integration Map
+
+### Internal Dependencies
+
+| Package                          | Purpose               |
+| -------------------------------- | --------------------- |
+| `internal/metrics/period`        | Time-bucketed storage |
+| `internal/common`                | Configuration flags   |
+| `github.com/yusing/goutils/errs` | Error handling        |
+
+### External Dependencies
+
+| Dependency                      | Purpose                   |
+| ------------------------------- | ------------------------- |
+| `github.com/shirou/gopsutil/v4` | System metrics collection |
+| `github.com/rs/zerolog`         | Logging                   |
+
+### Integration Points
+
+- gopsutil provides raw system metrics
+- period package handles storage and persistence
+- HTTP API provides query interface
+
+## Observability
+
+### Logs
+
+| Level | When                                       |
+| ----- | ------------------------------------------ |
+| Warn  | Non-critical errors (e.g., no sensor data) |
+| Error | Other errors                               |
+
+### Metrics
+
+No metrics exposed directly. Collection errors are logged.
+
+## Failure Modes and Recovery
+
+| Failure         | Detection                            | Recovery                         |
+| --------------- | ------------------------------------ | -------------------------------- |
+| No CPU data     | `cpu.Percent` returns error          | Skip and log later with warning  |
+| No memory data  | `mem.VirtualMemory` returns error    | Skip and log later with warning  |
+| No disk data    | `disk.Usage` returns error for all   | Skip and log later with warning  |
+| No network data | `net.IOCounters` returns error       | Skip and log later with warning  |
+| No sensor data  | `sensors.Temperatures` returns error | Skip and log later with warning  |
+| Context timeout | Context deadline exceeded            | Return partial data with warning |
+
+### Partial Collection
+
+The package uses `gperr.NewGroup` to collect errors from concurrent operations:
+
+```go
+errs := gperr.NewGroup("failed to get system info")
+errs.Go(func() error { return s.collectCPUInfo(ctx) })
+errs.Go(func() error { return s.collectMemoryInfo(ctx) })
+// ...
+result := errs.Wait()
+```
+
+Warnings (like `ENODATA`) are logged but don't fail the collection.
+Critical errors cause the function to return an error.
+
+## Usage Examples
+
+### Starting the Poller
+
+```go
+import "github.com/yusing/godoxy/internal/metrics/systeminfo"
+
+func init() {
+    systeminfo.Poller.Start()
+}
+```
+
+### HTTP Endpoint
+
+```go
+import "github.com/gin-gonic/gin"
+
+func setupMetricsAPI(r *gin.Engine) {
+    r.GET("/api/metrics/system", systeminfo.Poller.ServeHTTP)
+}
+```
+
+**API Examples:**
+
+```bash
+# Get latest metrics
+curl http://localhost:8080/api/metrics/system
+
+# Get 1-hour history with CPU aggregation
+curl "http://localhost:8080/api/metrics/system?period=1h&aggregate=cpu_average"
+
+# Get 24-hour memory usage history
+curl "http://localhost:8080/api/metrics/system?period=1d&aggregate=memory_usage_percent"
+
+# Get disk I/O for the last hour
+curl "http://localhost:8080/api/metrics/system?period=1h&aggregate=disks_read_speed"
+```
+
+### WebSocket Streaming
+
+```javascript
+const ws = new WebSocket(
+  "ws://localhost:8080/api/metrics/system?period=1m&interval=5s&aggregate=cpu_average"
+);
+
+ws.onmessage = (event) => {
+  const data = JSON.parse(event.data);
+  console.log("CPU:", data.data);
+};
+```
+
+### Direct Data Access
+
+```go
+// Get entries for the last hour
+entries, ok := systeminfo.Poller.Get(period.MetricsPeriod1h)
+for _, entry := range entries {
+    if entry.CPUAverage != nil {
+        fmt.Printf("CPU: %.1f%% at %d\n", *entry.CPUAverage, entry.Timestamp)
+    }
+}
+
+// Get the most recent metrics
+latest := systeminfo.Poller.GetLastResult()
+```
+
+### Disabling Metrics at Runtime
+
+```go
+import (
+    "github.com/yusing/godoxy/internal/common"
+    "github.com/yusing/godoxy/internal/metrics/systeminfo"
+)
+
+func init() {
+    // Disable expensive sensor collection
+    common.MetricsDisableSensors = true
+    systeminfo.Poller.Start()
+}
+```
+
+## Performance Characteristics
+
+- O(1) per metric collection (gopsutil handles complexity)
+- Concurrent collection of all categories
+- Rate calculations O(n) where n = number of disks/interfaces
+- Memory: O(5 _ 100 _ sizeof(SystemInfo))
+- JSON serialization O(n) for API responses
+
+### Collection Latency
+
+| Category | Typical Latency                        |
+| -------- | -------------------------------------- |
+| CPU      | ~10-50ms                               |
+| Memory   | ~5-10ms                                |
+| Disks    | ~10-100ms (depends on partition count) |
+| Network  | ~5-10ms                                |
+| Sensors  | ~10-50ms                               |
+
+## Testing Notes
+
+- Mock gopsutil calls for unit tests
+- Test with real metrics to verify rate calculations
+- Test aggregation modes with various data sets
+- Verify disable flags work correctly
+- Test partial failure scenarios
+
+## Related Packages
+
+- `internal/metrics/period` - Time-bucketed storage
+- `internal/api/v1/metrics` - HTTP API endpoints
+- `github.com/shirou/gopsutil/v4` - System metrics library
--- a/internal/metrics/uptime/README.md
+++ b/internal/metrics/uptime/README.md
@@ -0,0 +1,402 @@
+# Uptime
+
+Tracks and aggregates route health status over time, providing uptime/downtime statistics and latency metrics.
+
+## Overview
+
+The uptime package monitors route health status and calculates uptime percentages over configurable time periods. It integrates with the `period` package for historical storage and provides aggregated statistics for visualization.
+
+### Primary Consumers
+
+- `internal/api/v1/metrics` - HTTP endpoint for uptime data
+- `internal/homepage` - Dashboard uptime widgets
+- Monitoring and alerting systems
+
+### Non-goals
+
+- Does not perform health checks (handled by `internal/route/routes`)
+- Does not provide alerting on downtime
+- Does not persist data beyond the period package retention
+- Does not aggregate across multiple GoDoxy instances
+
+### Stability
+
+Internal package. Data format and API are stable.
+
+## Public API
+
+### Exported Types
+
+#### StatusByAlias
+
+```go
+type StatusByAlias struct {
+    Map       map[string]routes.HealthInfoWithoutDetail `json:"statuses"`
+    Timestamp int64                                     `json:"timestamp"`
+}
+```
+
+Container for health status of all routes at a specific time.
+
+#### Status
+
+```go
+type Status struct {
+    Status    types.HealthStatus `json:"status" swaggertype:"string" enums:"healthy,unhealthy,unknown,napping,starting"`
+    Latency   int32              `json:"latency"`
+    Timestamp int64              `json:"timestamp"`
+}
+```
+
+Individual route status at a point in time.
+
+#### RouteAggregate
+
+```go
+type RouteAggregate struct {
+    Alias         string             `json:"alias"`
+    DisplayName   string             `json:"display_name"`
+    Uptime        float32            `json:"uptime"`
+    Downtime      float32            `json:"downtime"`
+    Idle          float32            `json:"idle"`
+    AvgLatency    float32            `json:"avg_latency"`
+    IsDocker      bool               `json:"is_docker"`
+    IsExcluded    bool               `json:"is_excluded"`
+    CurrentStatus types.HealthStatus `json:"current_status" swaggertype:"string" enums:"healthy,unhealthy,unknown,napping,starting"`
+    Statuses      []Status           `json:"statuses"`
+}
+```
+
+Aggregated statistics for a single route.
+
+#### Aggregated
+
+```go
+type Aggregated []RouteAggregate
+```
+
+Slice of route aggregates, sorted alphabetically by alias.
+
+### Exported Variables
+
+#### Poller
+
+```go
+var Poller = period.NewPoller("uptime", getStatuses, aggregateStatuses)
+```
+
+Pre-configured poller for uptime metrics. Start with `Poller.Start()`.
+
+### Unexported Functions
+
+#### getStatuses
+
+```go
+func getStatuses(ctx context.Context, _ StatusByAlias) (StatusByAlias, error)
+```
+
+Collects current status of all routes. Called by the period poller every second.
+
+**Returns:**
+
+- `StatusByAlias` - Map of all route statuses with current timestamp
+- `error` - Always nil (errors are logged internally)
+
+#### aggregateStatuses
+
+```go
+func aggregateStatuses(entries []StatusByAlias, query url.Values) (int, Aggregated)
+```
+
+Aggregates status entries into route statistics.
+
+**Query Parameters:**
+
+- `period` - Time filter (5m, 15m, 1h, 1d, 1mo)
+- `limit` - Maximum number of routes to return (0 = all)
+- `offset` - Offset for pagination
+- `keyword` - Fuzzy search keyword for filtering routes
+
+**Returns:**
+
+- `int` - Total number of routes matching the query
+- `Aggregated` - Slice of route aggregates
+
+## Architecture
+
+### Core Components
+
+```mermaid
+flowchart TD
+    subgraph Health Monitoring
+        Routes[Routes] -->|GetHealthInfoWithoutDetail| Status[Status Map]
+        Status -->|Polls every| Second[1 Second]
+    end
+
+    subgraph Poller
+        Poll[getStatuses] -->|Collects| StatusByAlias
+        StatusByAlias -->|Stores in| Period[Period StatusByAlias]
+    end
+
+    subgraph Aggregation
+        Query[Query Params] -->|Filters| Aggregate[aggregateStatuses]
+        Aggregate -->|Calculates| RouteAggregate
+        RouteAggregate -->|Uptime| UP[Uptime %]
+        RouteAggregate -->|Downtime| DOWN[Downtime %]
+        RouteAggregate -->|Idle| IDLE[Idle %]
+        RouteAggregate -->|Latency| LAT[Avg Latency]
+    end
+
+    subgraph Response
+        RouteAggregate -->|JSON| Client[API Client]
+    end
+```
+
+### Data Flow
+
+```mermaid
+sequenceDiagram
+    participant Routes as Route Registry
+    participant Poller as Uptime Poller
+    participant Period as Period Storage
+    participant API as HTTP API
+
+    Routes->>Poller: GetHealthInfoWithoutDetail()
+    Poller->>Period: Add(StatusByAlias)
+
+    loop Every second
+        Poller->>Routes: Collect status
+        Poller->>Period: Store status
+    end
+
+    API->>Period: Get(filter)
+    Period-->>API: Entries
+    API->>API: aggregateStatuses()
+    API-->>Client: Aggregated JSON
+```
+
+### Status Types
+
+| Status      | Description                    | Counted as Uptime? |
+| ----------- | ------------------------------ | ------------------ |
+| `healthy`   | Route is responding normally   | Yes                |
+| `unhealthy` | Route is not responding        | No                 |
+| `unknown`   | Status could not be determined | Excluded           |
+| `napping`   | Route is in idle/sleep state   | Idle (separate)    |
+| `starting`  | Route is starting up           | Idle (separate)    |
+
+### Calculation Formula
+
+For a set of status entries:
+
+```
+Uptime = healthy_count / total_count
+Downtime = unhealthy_count / total_count
+Idle = (napping_count + starting_count) / total_count
+AvgLatency = sum(latency) / count
+```
+
+Note: `unknown` statuses are excluded from all calculations.
+
+## Configuration Surface
+
+No explicit configuration. The poller uses period package defaults:
+
+| Parameter     | Value                        |
+| ------------- | ---------------------------- |
+| Poll Interval | 1 second                     |
+| Retention     | 5m, 15m, 1h, 1d, 1mo periods |
+
+## Dependency and Integration Map
+
+### Internal Dependencies
+
+| Package                   | Purpose               |
+| ------------------------- | --------------------- |
+| `internal/route/routes`   | Health info retrieval |
+| `internal/metrics/period` | Time-bucketed storage |
+| `internal/types`          | HealthStatus enum     |
+| `internal/metrics/utils`  | Query utilities       |
+
+### External Dependencies
+
+| Dependency                               | Purpose          |
+| ---------------------------------------- | ---------------- |
+| `github.com/lithammer/fuzzysearch/fuzzy` | Keyword matching |
+| `github.com/bytedance/sonic`             | JSON marshaling  |
+
+### Integration Points
+
+- Route health monitors provide status via `routes.GetHealthInfoWithoutDetail()`
+- Period poller handles data collection and storage
+- HTTP API provides query interface via `Poller.ServeHTTP`
+
+## Observability
+
+### Logs
+
+Poller lifecycle and errors are logged via zerolog.
+
+### Metrics
+
+No metrics exposed directly. Status data available via API.
+
+## Failure Modes and Recovery
+
+| Failure                          | Detection                         | Recovery                       |
+| -------------------------------- | --------------------------------- | ------------------------------ |
+| Route health monitor unavailable | Empty map returned                | Log warning, continue          |
+| Invalid query parameters         | `aggregateStatuses` returns empty | Return empty result            |
+| Poller panic                     | Goroutine crash                   | Process terminates             |
+| Persistence failure              | Load/save error                   | Log, continue with empty state |
+
+### Fuzzy Search
+
+The package uses `fuzzy.MatchFold` for keyword matching:
+
+- Case-insensitive matching
+- Substring matching
+- Fuzzy ranking
+
+## Usage Examples
+
+### Starting the Poller
+
+```go
+import "github.com/yusing/godoxy/internal/metrics/uptime"
+
+func init() {
+    uptime.Poller.Start()
+}
+```
+
+### HTTP Endpoint
+
+```go
+import (
+    "github.com/gin-gonic/gin"
+    "github.com/yusing/godoxy/internal/metrics/uptime"
+)
+
+func setupUptimeAPI(r *gin.Engine) {
+    r.GET("/api/uptime", uptime.Poller.ServeHTTP)
+}
+```
+
+**API Examples:**
+
+```bash
+# Get latest status
+curl http://localhost:8080/api/uptime
+
+# Get 1-hour history
+curl "http://localhost:8080/api/uptime?period=1h"
+
+# Get with limit and offset (pagination)
+curl "http://localhost:8080/api/uptime?limit=10&offset=0"
+
+# Search for routes containing "api"
+curl "http://localhost:8080/api/uptime?keyword=api"
+
+# Combined query
+curl "http://localhost:8080/api/uptime?period=1d&limit=20&offset=0&keyword=docker"
+```
+
+### WebSocket Streaming
+
+```javascript
+const ws = new WebSocket(
+  "ws://localhost:8080/api/uptime?period=1m&interval=5s"
+);
+
+ws.onmessage = (event) => {
+  const data = JSON.parse(event.data);
+  data.data.forEach((route) => {
+    console.log(`${route.display_name}: ${route.uptime * 100}% uptime`);
+  });
+};
+```
+
+### Direct Data Access
+
+```go
+// Get entries for the last hour
+entries, ok := uptime.Poller.Get(period.MetricsPeriod1h)
+for _, entry := range entries {
+    for alias, status := range entry.Map {
+        fmt.Printf("Route %s: %s (latency: %dms)\n",
+            alias, status.Status, status.Latency.Milliseconds())
+    }
+}
+
+// Get aggregated statistics
+_, agg := uptime.aggregateStatuses(entries, url.Values{
+    "period": []string{"1h"},
+})
+
+for _, route := range agg {
+    fmt.Printf("%s: %.1f%% uptime, %.1fms avg latency\n",
+        route.DisplayName, route.Uptime*100, route.AvgLatency)
+}
+```
+
+### Response Format
+
+**Latest Status Response:**
+
+```json
+{
+  "alias1": {
+    "status": "healthy",
+    "latency": 45
+  },
+  "alias2": {
+    "status": "unhealthy",
+    "latency": 0
+  }
+}
+```
+
+**Aggregated Response:**
+
+```json
+{
+  "total": 5,
+  "data": [
+    {
+      "alias": "api-server",
+      "display_name": "API Server",
+      "uptime": 0.98,
+      "downtime": 0.02,
+      "idle": 0.0,
+      "avg_latency": 45.5,
+      "is_docker": true,
+      "is_excluded": false,
+      "current_status": "healthy",
+      "statuses": [
+        { "status": "healthy", "latency": 45, "timestamp": 1704892800 }
+      ]
+    }
+  ]
+}
+```
+
+## Performance Characteristics
+
+- O(n) status collection per poll where n = number of routes
+- O(m \* k) aggregation where m = entries, k = routes
+- Memory: O(p _ r _ s) where p = periods, r = routes, s = status size
+- Fuzzy search is O(routes \* keyword_length)
+
+## Testing Notes
+
+- Mock `routes.GetHealthInfoWithoutDetail()` for testing
+- Test aggregation with known status sequences
+- Verify pagination and filtering logic
+- Test fuzzy search matching
+
+## Related Packages
+
+- `internal/route/routes` - Route health monitoring
+- `internal/metrics/period` - Time-bucketed metrics storage
+- `internal/types` - Health status types