Skip to main content
The collector monitors AMD GPUs directly from the Linux kernel’s sysfs and hwmon interfaces. No ROCm, no user-space libraries, and no additional drivers are needed beyond the standard AMDGPU kernel module.

Requirements

  • Linux with the amdgpu kernel driver
  • Kernel 5.x+ (sysfs/hwmon paths are stable from 5.x onwards)

Collected metrics

MetricDescription
hw.gpu.utilizationCompute utilization (0.0–1.0)
hw.gpu.memory.utilizationMemory controller utilization (0.0–1.0)
hw.gpu.memory.limitTotal VRAM (bytes)
hw.gpu.memory.usageUsed VRAM (bytes)
hw.gpu.memory.freeFree VRAM (bytes)
hw.gpu.temperatureDie temperature (°C)
hw.gpu.fan_speedFan speed (RPM)
hw.gpu.power.drawCurrent power draw (W)
hw.gpu.power.limitPower cap (W)
hw.gpu.energy.consumedCumulative energy (J)

Docker

docker run -d \
  --name otel-gpu-collector \
  --device /dev/kfd:/dev/kfd \
  --device /dev/dri:/dev/dri \
  -e OTEL_SERVICE_NAME=my-app \
  -e OTEL_RESOURCE_ATTRIBUTES='deployment.environment=production' \
  -e OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318 \
  ghcr.io/openlit/otel-gpu-collector:latest

Docker Compose

services:
  otel-gpu-collector:
    image: ghcr.io/openlit/otel-gpu-collector:latest
    environment:
      OTEL_SERVICE_NAME: my-app
      OTEL_RESOURCE_ATTRIBUTES: deployment.environment=production
      OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4318
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
    restart: always

Kubernetes (DaemonSet)

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-gpu-collector
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: otel-gpu-collector
  template:
    metadata:
      labels:
        app: otel-gpu-collector
    spec:
      containers:
        - name: collector
          image: ghcr.io/openlit/otel-gpu-collector:latest
          env:
            - name: OTEL_SERVICE_NAME
              value: gpu-collector
            - name: OTEL_RESOURCE_ATTRIBUTES
              value: deployment.environment=production
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: http://otel-collector.monitoring.svc.cluster.local:4318
          securityContext:
            privileged: false
          volumeMounts:
            - name: sys
              mountPath: /sys
              readOnly: true
      volumes:
        - name: sys
          hostPath:
            path: /sys

Metrics reference

Full metrics list with types, units, and attributes

Configuration

All environment variables and defaults