# System Info Collects and aggregates system metrics including CPU, memory, disk, network, and sensor data with configurable aggregation modes. ## Overview The systeminfo package a custom fork of the [gopsutil](https://github.com/shirou/gopsutil) library to collect system metrics and integrates with the `period` package for time-bucketed storage. It supports collecting CPU, memory, disk, network, and sensor data with configurable collection intervals and aggregation modes for visualization. ### Primary Consumers - `internal/api/v1/metrics` - HTTP endpoint for system metrics - `internal/homepage` - Dashboard system monitoring widgets - Monitoring and alerting systems ### Non-goals - Does not provide alerting on metric thresholds - Does not persist metrics beyond the period package retention - Does not provide data aggregation across multiple instances - Does not support custom metric collectors ### Stability Internal package. Data format and API are stable. ## Public API ### Exported Types #### SystemInfo Struct ```go type SystemInfo struct { Timestamp int64 `json:"timestamp"` CPUAverage *float64 `json:"cpu_average"` Memory mem.VirtualMemoryStat `json:"memory"` Disks map[string]disk.UsageStat `json:"disks"` DisksIO map[string]*disk.IOCountersStat `json:"disks_io"` Network net.IOCountersStat `json:"network"` Sensors Sensors `json:"sensors"` } ``` Container for all system metrics at a point in time. **Fields:** - `Timestamp` - Unix timestamp of collection - `CPUAverage` - Average CPU usage percentage (0-100) - `Memory` - Virtual memory statistics (used, total, percent, etc.) - `Disks` - Disk usage by partition mountpoint - `DisksIO` - Disk I/O counters by device name - `Network` - Network I/O counters for primary interface - `Sensors` - Hardware temperature sensor readings #### Sensors Type ```go type Sensors []sensors.TemperatureStat ``` Slice of temperature sensor readings. #### Aggregated Type ```go type Aggregated []map[string]any ``` Aggregated data suitable for charting libraries like Recharts. Each entry is a map with timestamp and values. #### SystemInfoAggregateMode Type ```go type SystemInfoAggregateMode string ``` Aggregation mode constants: ```go const ( SystemInfoAggregateModeCPUAverage SystemInfoAggregateMode = "cpu_average" SystemInfoAggregateModeMemoryUsage SystemInfoAggregateMode = "memory_usage" SystemInfoAggregateModeMemoryUsagePercent SystemInfoAggregateMode = "memory_usage_percent" SystemInfoAggregateModeDisksReadSpeed SystemInfoAggregateMode = "disks_read_speed" SystemInfoAggregateModeDisksWriteSpeed SystemInfoAggregateMode = "disks_write_speed" SystemInfoAggregateModeDisksIOPS SystemInfoAggregateMode = "disks_iops" SystemInfoAggregateModeDiskUsage SystemInfoAggregateMode = "disk_usage" SystemInfoAggregateModeNetworkSpeed SystemInfoAggregateMode = "network_speed" SystemInfoAggregateModeNetworkTransfer SystemInfoAggregateMode = "network_transfer" SystemInfoAggregateModeSensorTemperature SystemInfoAggregateMode = "sensor_temperature" ) ``` ### Exported Variables #### Poller ```go var Poller = period.NewPoller("system_info", getSystemInfo, aggregate) ``` Pre-configured poller for system info metrics. Start with `Poller.Start()`. ### Exported Functions #### getSystemInfo ```go func getSystemInfo(ctx context.Context, lastResult *SystemInfo) (*SystemInfo, error) ``` Collects current system metrics. This is the poll function passed to the period poller. **Features:** - Concurrent collection of all metric categories - Handles partial failures gracefully - Calculates rates based on previous result (for speed metrics) - Logs warnings for non-critical errors **Rate Calculations:** - Disk read/write speed: `(currentBytes - lastBytes) / interval` - Disk IOPS: `(currentCount - lastCount) / interval` - Network speed: `(currentBytes - lastBytes) / interval` #### aggregate ```go func aggregate(entries []*SystemInfo, query url.Values) (total int, result Aggregated) ``` Aggregates system info entries for a specific mode. Called by the period poller. **Query Parameters:** - `aggregate` - The aggregation mode (see constants above) **Returns:** - `total` - Number of aggregated entries - `result` - Slice of maps suitable for charting ## Architecture ### Core Components ```mermaid flowchart TD subgraph Collection G[gopsutil] -->|CPU| CPU[CPU Percent] G -->|Memory| Mem[Virtual Memory] G -->|Disks| Disk[Partitions & IO] G -->|Network| Net[Network Counters] G -->|Sensors| Sens[Temperature] end subgraph Poller Collect[getSystemInfo] -->|Aggregates| Info[SystemInfo] Info -->|Stores in| Period[Period SystemInfo] end subgraph Aggregation Modes CPUAvg[cpu_average] MemUsage[memory_usage] MemPercent[memory_usage_percent] DiskRead[disks_read_speed] DiskWrite[disks_write_speed] DiskIOPS[disks_iops] DiskUsage[disk_usage] NetSpeed[network_speed] NetTransfer[network_transfer] SensorTemp[sensor_temperature] end Period -->|Query with| Aggregate[aggregate function] Aggregate --> CPUAvg Aggregate --> MemUsage Aggregate --> DiskRead ``` ### Data Flow ```mermaid sequenceDiagram participant gopsutil participant Poller participant Period participant API Poller->>Poller: Start background goroutine loop Every 1 second Poller->>gopsutil: Collect CPU (500ms timeout) Poller->>gopsutil: Collect Memory Poller->>gopsutil: Collect Disks (partition + IO) Poller->>gopsutil: Collect Network Poller->>gopsutil: Collect Sensors gopsutil-->>Poller: SystemInfo Poller->>Period: Add(SystemInfo) end API->>Period: Get(filter) Period-->>API: Entries API->>API: aggregate(entries, mode) API-->>Client: Chart data ``` ### Collection Categories | Category | Data Source | Optional | Rate Metrics | | -------- | ------------------------------------------------------ | -------- | --------------------- | | CPU | `cpu.PercentWithContext` | Yes | No | | Memory | `mem.VirtualMemoryWithContext` | Yes | No | | Disks | `disk.PartitionsWithContext` + `disk.UsageWithContext` | Yes | Yes (read/write/IOPS) | | Network | `net.IOCountersWithContext` | Yes | Yes (upload/download) | | Sensors | `sensors.TemperaturesWithContext` | Yes | No | ### Aggregation Modes Each mode produces chart-friendly output: **CPU Average:** ```json [ { "timestamp": 1704892800, "cpu_average": 45.5 }, { "timestamp": 1704892810, "cpu_average": 52.3 } ] ``` **Memory Usage:** ```json [ { "timestamp": 1704892800, "memory_usage": 8388608000 }, { "timestamp": 1704892810, "memory_usage": 8453440000 } ] ``` **Disk Read/Write Speed:** ```json [ { "timestamp": 1704892800, "sda": 10485760, "sdb": 5242880 }, { "timestamp": 1704892810, "sda": 15728640, "sdb": 4194304 } ] ``` ## Configuration Surface ### Disabling Metrics Categories Metrics categories can be disabled via environment variables: | Variable | Purpose | | ------------------------- | ------------------------------------------- | | `METRICS_DISABLE_CPU` | Set to "true" to disable CPU collection | | `METRICS_DISABLE_MEMORY` | Set to "true" to disable memory collection | | `METRICS_DISABLE_DISK` | Set to "true" to disable disk collection | | `METRICS_DISABLE_NETWORK` | Set to "true" to disable network collection | | `METRICS_DISABLE_SENSORS` | Set to "true" to disable sensor collection | ## Dependency and Integration Map ### Internal Dependencies | Package | Purpose | | -------------------------------- | --------------------- | | `internal/metrics/period` | Time-bucketed storage | | `internal/common` | Configuration flags | | `github.com/yusing/goutils/errs` | Error handling | ### External Dependencies | Dependency | Purpose | | ------------------------------- | ------------------------- | | `github.com/shirou/gopsutil/v4` | System metrics collection | | `github.com/rs/zerolog` | Logging | ### Integration Points - gopsutil provides raw system metrics - period package handles storage and persistence - HTTP API provides query interface ## Observability ### Logs | Level | When | | ----- | ------------------------------------------ | | Warn | Non-critical errors (e.g., no sensor data) | | Error | Other errors | ### Metrics No metrics exposed directly. Collection errors are logged. ## Failure Modes and Recovery | Failure | Detection | Recovery | | --------------- | ------------------------------------ | -------------------------------- | | No CPU data | `cpu.Percent` returns error | Skip and log later with warning | | No memory data | `mem.VirtualMemory` returns error | Skip and log later with warning | | No disk data | `disk.Usage` returns error for all | Skip and log later with warning | | No network data | `net.IOCounters` returns error | Skip and log later with warning | | No sensor data | `sensors.Temperatures` returns error | Skip and log later with warning | | Context timeout | Context deadline exceeded | Return partial data with warning | ### Partial Collection The package uses `gperr.NewGroup` to collect errors from concurrent operations: ```go errs := gperr.NewGroup("failed to get system info") errs.Go(func() error { return s.collectCPUInfo(ctx) }) errs.Go(func() error { return s.collectMemoryInfo(ctx) }) // ... result := errs.Wait() ``` Warnings (like `ENODATA`) are logged but don't fail the collection. Critical errors cause the function to return an error. ## Usage Examples ### Starting the Poller ```go import "github.com/yusing/godoxy/internal/metrics/systeminfo" func init() { systeminfo.Poller.Start() } ``` ### HTTP Endpoint ```go import "github.com/gin-gonic/gin" func setupMetricsAPI(r *gin.Engine) { r.GET("/api/metrics/system", systeminfo.Poller.ServeHTTP) } ``` **API Examples:** ```bash # Get latest metrics curl http://localhost:8080/api/metrics/system # Get 1-hour history with CPU aggregation curl "http://localhost:8080/api/metrics/system?period=1h&aggregate=cpu_average" # Get 24-hour memory usage history curl "http://localhost:8080/api/metrics/system?period=1d&aggregate=memory_usage_percent" # Get disk I/O for the last hour curl "http://localhost:8080/api/metrics/system?period=1h&aggregate=disks_read_speed" ``` ### WebSocket Streaming ```javascript const ws = new WebSocket( "ws://localhost:8080/api/metrics/system?period=1m&interval=5s&aggregate=cpu_average" ); ws.onmessage = (event) => { const data = JSON.parse(event.data); console.log("CPU:", data.data); }; ``` ### Direct Data Access ```go // Get entries for the last hour entries, ok := systeminfo.Poller.Get(period.MetricsPeriod1h) for _, entry := range entries { if entry.CPUAverage != nil { fmt.Printf("CPU: %.1f%% at %d\n", *entry.CPUAverage, entry.Timestamp) } } // Get the most recent metrics latest := systeminfo.Poller.GetLastResult() ``` ### Disabling Metrics at Runtime ```go import ( "github.com/yusing/godoxy/internal/common" "github.com/yusing/godoxy/internal/metrics/systeminfo" ) func init() { // Disable expensive sensor collection common.MetricsDisableSensors = true systeminfo.Poller.Start() } ``` ## Performance Characteristics - O(1) per metric collection (gopsutil handles complexity) - Concurrent collection of all categories - Rate calculations O(n) where n = number of disks/interfaces - Memory: O(5 _ 100 _ sizeof(SystemInfo)) - JSON serialization O(n) for API responses ### Collection Latency | Category | Typical Latency | | -------- | -------------------------------------- | | CPU | ~10-50ms | | Memory | ~5-10ms | | Disks | ~10-100ms (depends on partition count) | | Network | ~5-10ms | | Sensors | ~10-50ms | ## Testing Notes - Mock gopsutil calls for unit tests - Test with real metrics to verify rate calculations - Test aggregation modes with various data sets - Verify disable flags work correctly - Test partial failure scenarios ## Related Packages - `internal/metrics/period` - Time-bucketed storage - `internal/api/v1/metrics` - HTTP API endpoints - `github.com/shirou/gopsutil/v4` - System metrics library