Files

System Info

Collects and aggregates system metrics including CPU, memory, disk, network, and sensor data with configurable aggregation modes.

Overview

The systeminfo package a custom fork of the gopsutil library to collect system metrics and integrates with the period package for time-bucketed storage. It supports collecting CPU, memory, disk, network, and sensor data with configurable collection intervals and aggregation modes for visualization.

Primary Consumers

  • internal/api/v1/metrics - HTTP endpoint for system metrics
  • internal/homepage - Dashboard system monitoring widgets
  • Monitoring and alerting systems

Non-goals

  • Does not provide alerting on metric thresholds
  • Does not persist metrics beyond the period package retention
  • Does not provide data aggregation across multiple instances
  • Does not support custom metric collectors

Stability

Internal package. Data format and API are stable.

Public API

Exported Types

SystemInfo Struct

type SystemInfo struct {
    Timestamp  int64                           `json:"timestamp"`
    CPUAverage *float64                        `json:"cpu_average"`
    Memory     mem.VirtualMemoryStat           `json:"memory"`
    Disks      map[string]disk.UsageStat       `json:"disks"`
    DisksIO    map[string]*disk.IOCountersStat `json:"disks_io"`
    Network    net.IOCountersStat              `json:"network"`
    Sensors    Sensors                         `json:"sensors"`
}

Container for all system metrics at a point in time.

Fields:

  • Timestamp - Unix timestamp of collection
  • CPUAverage - Average CPU usage percentage (0-100)
  • Memory - Virtual memory statistics (used, total, percent, etc.)
  • Disks - Disk usage by partition mountpoint
  • DisksIO - Disk I/O counters by device name
  • Network - Network I/O counters for primary interface
  • Sensors - Hardware temperature sensor readings

Sensors Type

type Sensors []sensors.TemperatureStat

Slice of temperature sensor readings.

Aggregated Type

type Aggregated []map[string]any

Aggregated data suitable for charting libraries like Recharts. Each entry is a map with timestamp and values.

SystemInfoAggregateMode Type

type SystemInfoAggregateMode string

Aggregation mode constants:

const (
    SystemInfoAggregateModeCPUAverage         SystemInfoAggregateMode = "cpu_average"
    SystemInfoAggregateModeMemoryUsage        SystemInfoAggregateMode = "memory_usage"
    SystemInfoAggregateModeMemoryUsagePercent SystemInfoAggregateMode = "memory_usage_percent"
    SystemInfoAggregateModeDisksReadSpeed     SystemInfoAggregateMode = "disks_read_speed"
    SystemInfoAggregateModeDisksWriteSpeed    SystemInfoAggregateMode = "disks_write_speed"
    SystemInfoAggregateModeDisksIOPS          SystemInfoAggregateMode = "disks_iops"
    SystemInfoAggregateModeDiskUsage          SystemInfoAggregateMode = "disk_usage"
    SystemInfoAggregateModeNetworkSpeed       SystemInfoAggregateMode = "network_speed"
    SystemInfoAggregateModeNetworkTransfer    SystemInfoAggregateMode = "network_transfer"
    SystemInfoAggregateModeSensorTemperature  SystemInfoAggregateMode = "sensor_temperature"
)

Exported Variables

Poller

var Poller = period.NewPoller("system_info", getSystemInfo, aggregate)

Pre-configured poller for system info metrics. Start with Poller.Start().

Exported Functions

getSystemInfo

func getSystemInfo(ctx context.Context, lastResult *SystemInfo) (*SystemInfo, error)

Collects current system metrics. This is the poll function passed to the period poller.

Features:

  • Concurrent collection of all metric categories
  • Handles partial failures gracefully
  • Calculates rates based on previous result (for speed metrics)
  • Logs warnings for non-critical errors

Rate Calculations:

  • Disk read/write speed: (currentBytes - lastBytes) / interval
  • Disk IOPS: (currentCount - lastCount) / interval
  • Network speed: (currentBytes - lastBytes) / interval

aggregate

func aggregate(entries []*SystemInfo, query url.Values) (total int, result Aggregated)

Aggregates system info entries for a specific mode. Called by the period poller.

Query Parameters:

  • aggregate - The aggregation mode (see constants above)

Returns:

  • total - Number of aggregated entries
  • result - Slice of maps suitable for charting

Architecture

Core Components

flowchart TD
    subgraph Collection
        G[gopsutil] -->|CPU| CPU[CPU Percent]
        G -->|Memory| Mem[Virtual Memory]
        G -->|Disks| Disk[Partitions & IO]
        G -->|Network| Net[Network Counters]
        G -->|Sensors| Sens[Temperature]
    end

    subgraph Poller
        Collect[getSystemInfo] -->|Aggregates| Info[SystemInfo]
        Info -->|Stores in| Period[Period SystemInfo]
    end

    subgraph Aggregation Modes
        CPUAvg[cpu_average]
        MemUsage[memory_usage]
        MemPercent[memory_usage_percent]
        DiskRead[disks_read_speed]
        DiskWrite[disks_write_speed]
        DiskIOPS[disks_iops]
        DiskUsage[disk_usage]
        NetSpeed[network_speed]
        NetTransfer[network_transfer]
        SensorTemp[sensor_temperature]
    end

    Period -->|Query with| Aggregate[aggregate function]
    Aggregate --> CPUAvg
    Aggregate --> MemUsage
    Aggregate --> DiskRead

Data Flow

sequenceDiagram
    participant gopsutil
    participant Poller
    participant Period
    participant API

    Poller->>Poller: Start background goroutine

    loop Every 1 second
        Poller->>gopsutil: Collect CPU (500ms timeout)
        Poller->>gopsutil: Collect Memory
        Poller->>gopsutil: Collect Disks (partition + IO)
        Poller->>gopsutil: Collect Network
        Poller->>gopsutil: Collect Sensors

        gopsutil-->>Poller: SystemInfo
        Poller->>Period: Add(SystemInfo)
    end

    API->>Period: Get(filter)
    Period-->>API: Entries
    API->>API: aggregate(entries, mode)
    API-->>Client: Chart data

Collection Categories

Category Data Source Optional Rate Metrics
CPU cpu.PercentWithContext Yes No
Memory mem.VirtualMemoryWithContext Yes No
Disks disk.PartitionsWithContext + disk.UsageWithContext Yes Yes (read/write/IOPS)
Network net.IOCountersWithContext Yes Yes (upload/download)
Sensors sensors.TemperaturesWithContext Yes No

Aggregation Modes

Each mode produces chart-friendly output:

CPU Average:

[
  { "timestamp": 1704892800, "cpu_average": 45.5 },
  { "timestamp": 1704892810, "cpu_average": 52.3 }
]

Memory Usage:

[
  { "timestamp": 1704892800, "memory_usage": 8388608000 },
  { "timestamp": 1704892810, "memory_usage": 8453440000 }
]

Disk Read/Write Speed:

[
  { "timestamp": 1704892800, "sda": 10485760, "sdb": 5242880 },
  { "timestamp": 1704892810, "sda": 15728640, "sdb": 4194304 }
]

Configuration Surface

Disabling Metrics Categories

Metrics categories can be disabled via environment variables:

Variable Purpose
METRICS_DISABLE_CPU Set to "true" to disable CPU collection
METRICS_DISABLE_MEMORY Set to "true" to disable memory collection
METRICS_DISABLE_DISK Set to "true" to disable disk collection
METRICS_DISABLE_NETWORK Set to "true" to disable network collection
METRICS_DISABLE_SENSORS Set to "true" to disable sensor collection

Dependency and Integration Map

Internal Dependencies

Package Purpose
internal/metrics/period Time-bucketed storage
internal/common Configuration flags
github.com/yusing/goutils/errs Error handling

External Dependencies

Dependency Purpose
github.com/shirou/gopsutil/v4 System metrics collection
github.com/rs/zerolog Logging

Integration Points

  • gopsutil provides raw system metrics
  • period package handles storage and persistence
  • HTTP API provides query interface

Observability

Logs

Level When
Warn Non-critical errors (e.g., no sensor data)
Error Other errors

Metrics

No metrics exposed directly. Collection errors are logged.

Failure Modes and Recovery

Failure Detection Recovery
No CPU data cpu.Percent returns error Skip and log later with warning
No memory data mem.VirtualMemory returns error Skip and log later with warning
No disk data disk.Usage returns error for all Skip and log later with warning
No network data net.IOCounters returns error Skip and log later with warning
No sensor data sensors.Temperatures returns error Skip and log later with warning
Context timeout Context deadline exceeded Return partial data with warning

Partial Collection

The package uses gperr.NewGroup to collect errors from concurrent operations:

errs := gperr.NewGroup("failed to get system info")
errs.Go(func() error { return s.collectCPUInfo(ctx) })
errs.Go(func() error { return s.collectMemoryInfo(ctx) })
// ...
result := errs.Wait()

Warnings (like ENODATA) are logged but don't fail the collection. Critical errors cause the function to return an error.

Usage Examples

Starting the Poller

import "github.com/yusing/godoxy/internal/metrics/systeminfo"

func init() {
    systeminfo.Poller.Start()
}

HTTP Endpoint

import "github.com/gin-gonic/gin"

func setupMetricsAPI(r *gin.Engine) {
    r.GET("/api/metrics/system", systeminfo.Poller.ServeHTTP)
}

API Examples:

# Get latest metrics
curl http://localhost:8080/api/metrics/system

# Get 1-hour history with CPU aggregation
curl "http://localhost:8080/api/metrics/system?period=1h&aggregate=cpu_average"

# Get 24-hour memory usage history
curl "http://localhost:8080/api/metrics/system?period=1d&aggregate=memory_usage_percent"

# Get disk I/O for the last hour
curl "http://localhost:8080/api/metrics/system?period=1h&aggregate=disks_read_speed"

WebSocket Streaming

const ws = new WebSocket(
  "ws://localhost:8080/api/metrics/system?period=1m&interval=5s&aggregate=cpu_average"
);

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log("CPU:", data.data);
};

Direct Data Access

// Get entries for the last hour
entries, ok := systeminfo.Poller.Get(period.MetricsPeriod1h)
for _, entry := range entries {
    if entry.CPUAverage != nil {
        fmt.Printf("CPU: %.1f%% at %d\n", *entry.CPUAverage, entry.Timestamp)
    }
}

// Get the most recent metrics
latest := systeminfo.Poller.GetLastResult()

Disabling Metrics at Runtime

import (
    "github.com/yusing/godoxy/internal/common"
    "github.com/yusing/godoxy/internal/metrics/systeminfo"
)

func init() {
    // Disable expensive sensor collection
    common.MetricsDisableSensors = true
    systeminfo.Poller.Start()
}

Performance Characteristics

  • O(1) per metric collection (gopsutil handles complexity)
  • Concurrent collection of all categories
  • Rate calculations O(n) where n = number of disks/interfaces
  • Memory: O(5 _ 100 _ sizeof(SystemInfo))
  • JSON serialization O(n) for API responses

Collection Latency

Category Typical Latency
CPU ~10-50ms
Memory ~5-10ms
Disks ~10-100ms (depends on partition count)
Network ~5-10ms
Sensors ~10-50ms

Testing Notes

  • Mock gopsutil calls for unit tests
  • Test with real metrics to verify rate calculations
  • Test aggregation modes with various data sets
  • Verify disable flags work correctly
  • Test partial failure scenarios
  • internal/metrics/period - Time-bucketed storage
  • internal/api/v1/metrics - HTTP API endpoints
  • github.com/shirou/gopsutil/v4 - System metrics library