Metrics Reference

All metric names and attributes follow the OpenTelemetry semantic conventions for hardware and system metrics.

GPU Hardware Telemetry

Collected for each detected GPU on Linux. Availability depends on vendor and GPU model.

Metrics

Metric	Type	Unit	Description	NVIDIA	AMD	Intel
`hw.gpu.utilization`	Gauge	`1`	GPU compute/encoder/decoder utilization (0.0–1.0)	Yes	Yes	—
`hw.gpu.memory.utilization`	Gauge	`1`	Memory controller utilization (0.0–1.0)	Yes	Yes	—
`hw.gpu.memory.limit`	UpDownCounter	`By`	Total GPU memory	Yes	Yes	—
`hw.gpu.memory.usage`	UpDownCounter	`By`	Used GPU memory	Yes	Yes	—
`hw.gpu.memory.free`	UpDownCounter	`By`	Free GPU memory	Yes	Yes	—
`hw.gpu.temperature`	Gauge	`Cel`	Die or memory temperature	Yes	Yes	Yes
`hw.gpu.fan_speed`	Gauge	`{rpm}`	Fan speed	Yes	Yes	Yes*
`hw.gpu.power.draw`	Gauge	`W`	Current power draw	Yes	Yes	Yes
`hw.gpu.power.limit`	Gauge	`W`	Power limit/cap	Yes	Yes	Yes
`hw.gpu.energy.consumed`	Counter	`J`	Cumulative energy consumed	Yes	Yes	Yes
`hw.gpu.clock.graphics`	Gauge	`MHz`	Graphics/SM clock frequency	Yes	Yes	Yes*
`hw.gpu.clock.memory`	Gauge	`MHz`	Memory clock frequency	Yes	Yes	—
`hw.errors`	Counter	`{error}`	ECC and PCIe error count	Yes	—	—

* Intel fan speed requires Linux kernel 6.16+. Intel graphics clock requires the Xe driver.

Attributes

All GPU metrics carry these base attributes:

Attribute	Description	Example
`hw.id`	Unique device identifier (required by spec)	`GPU-a1b2c3d4-5678-...`
`hw.name`	Product name	`NVIDIA A100-SXM4-80GB`
`hw.vendor`	Vendor name	`nvidia`, `amd`, `intel`
`gpu.index`	Zero-based device index	`0`, `1`
`gpu.pci_address`	PCI bus address	`0000:01:00.0`

Additional per-metric attributes:

Metric	Attribute	Values
`hw.gpu.utilization`	`hw.gpu.task`	`general`, `encoder`, `decoder`
`hw.gpu.temperature`	`sensor`	`die`, `memory`
`hw.errors`	`error.type`	`corrected`, `uncorrected`, `pcie_replay`
`hw.errors`	`hw.type`	`gpu`

System Metrics

Collected on all platforms (Linux, macOS, Windows) via gopsutil. Follows the OTel semantic conventions for system metrics.

Metric	Type	Unit	Description	Attributes
`system.cpu.utilization`	Gauge	`1`	CPU utilization per logical core (0.0–1.0)	`cpu.logical_number`
`system.cpu.logical.count`	UpDownCounter	`{cpu}`	Number of logical CPU cores
`system.memory.usage`	UpDownCounter	`By`	Memory bytes by state	`system.memory.state`=
`system.memory.utilization`	Gauge	`1`	Memory utilization (0.0–1.0)
`system.disk.io`	Counter	`By`	Disk I/O bytes	`system.device`, `disk.io.direction`=
`system.disk.operations`	Counter	`{operation}`	Disk I/O operations	`system.device`, `disk.io.direction`=
`system.filesystem.usage`	UpDownCounter	`By`	Filesystem space by state	`system.device`, `system.filesystem.mountpoint`, `system.filesystem.type`, `system.filesystem.state`=
`system.filesystem.utilization`	Gauge	`1`	Filesystem utilization (0.0–1.0)	`system.device`, `system.filesystem.mountpoint`, `system.filesystem.type`
`system.network.io`	Counter	`By`	Network I/O bytes	`network.interface.name`, `network.io.direction`=
`system.network.errors`	Counter	`{error}`	Network errors	`network.interface.name`, `network.io.direction`=

system.memory.state values cached and buffers are only reported on Linux. Loopback interfaces (lo, lo0) are excluded from network metrics.

Process Metrics

Self-monitoring of the collector process. Follows the OTel semantic conventions for process metrics.

Metric	Type	Unit	Description	Attributes
`process.cpu.time`	Counter	`s`	Cumulative CPU time	`cpu.mode`=
`process.cpu.utilization`	Gauge	`1`	CPU utilization (0.0–1.0)
`process.memory.usage`	UpDownCounter	`By`	Resident memory (RSS)
`process.memory.virtual`	UpDownCounter	`By`	Virtual memory size
`process.thread.count`	UpDownCounter	`{thread}`	OS thread count
`process.unix.file_descriptor.count`	UpDownCounter	`{file_descriptor}`	Open file descriptors (Linux/macOS)
`process.runtime.go.goroutines`	Gauge	`{goroutine}`	Go goroutine count
`process.runtime.go.mem.heap_alloc`	Gauge	`By`	Go heap memory allocated

eBPF CUDA Metrics (opt-in)

Enable with OTEL_GPU_EBPF_ENABLED=true. Requires Linux, CAP_BPF + CAP_PERFMON (or root), and an NVIDIA CUDA runtime (libcudart.so). Attaches uprobes to cudaLaunchKernel, cudaMalloc, and cudaMemcpy.

Metric	Type	Unit	Description	Attributes
`gpu.kernel.launch.calls`	Counter	`{call}`	CUDA kernel launch count	`cuda.kernel.name`
`gpu.kernel.grid.size`	Histogram	`{thread}`	Total threads in grid per launch	`cuda.kernel.name`
`gpu.kernel.block.size`	Histogram	`{thread}`	Threads per block per launch	`cuda.kernel.name`
`gpu.memory.allocations`	Counter	`By`	Bytes allocated via cudaMalloc
`gpu.memory.copies`	Histogram	`By`	Bytes per cudaMemcpy call	`cuda.memcpy.kind`=

​GPU Hardware Telemetry

​Metrics

​Attributes

​System Metrics

​Process Metrics

​eBPF CUDA Metrics (opt-in)

GPU Hardware Telemetry

Metrics

Attributes

System Metrics

Process Metrics

eBPF CUDA Metrics (opt-in)