Documentation Index Fetch the complete documentation index at: https://docs.openlit.io/llms.txt
Use this file to discover all available pages before exploring further.
The collector monitors AMD GPUs directly from the Linux kernel’s sysfs and hwmon interfaces. No ROCm, no user-space libraries, and no additional drivers are needed beyond the standard AMDGPU kernel module.
Requirements
Linux with the amdgpu kernel driver
Kernel 5.x+ (sysfs/hwmon paths are stable from 5.x onwards)
Collected metrics
Metric Description hw.gpu.utilizationCompute utilization (0.0–1.0) hw.gpu.memory.utilizationMemory controller utilization (0.0–1.0) hw.gpu.memory.limitTotal VRAM (bytes) hw.gpu.memory.usageUsed VRAM (bytes) hw.gpu.memory.freeFree VRAM (bytes) hw.gpu.temperatureDie temperature (°C) hw.gpu.fan_speedFan speed (RPM) hw.gpu.power.drawCurrent power draw (W) hw.gpu.power.limitPower cap (W) hw.gpu.energy.consumedCumulative energy (J)
Docker
docker run -d \
--name otel-gpu-collector \
--device /dev/kfd:/dev/kfd \
--device /dev/dri:/dev/dri \
-e OTEL_SERVICE_NAME=my-app \
-e OTEL_RESOURCE_ATTRIBUTES='deployment.environment=production' \
-e OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318 \
ghcr.io/openlit/otel-gpu-collector:latest
Docker Compose
services :
otel-gpu-collector :
image : ghcr.io/openlit/otel-gpu-collector:latest
environment :
OTEL_SERVICE_NAME : my-app
OTEL_RESOURCE_ATTRIBUTES : deployment.environment=production
OTEL_EXPORTER_OTLP_ENDPOINT : http://otel-collector:4318
devices :
- /dev/kfd:/dev/kfd
- /dev/dri:/dev/dri
restart : always
Kubernetes (DaemonSet)
apiVersion : apps/v1
kind : DaemonSet
metadata :
name : otel-gpu-collector
namespace : monitoring
spec :
selector :
matchLabels :
app : otel-gpu-collector
template :
metadata :
labels :
app : otel-gpu-collector
spec :
containers :
- name : collector
image : ghcr.io/openlit/otel-gpu-collector:latest
env :
- name : OTEL_SERVICE_NAME
value : gpu-collector
- name : OTEL_RESOURCE_ATTRIBUTES
value : deployment.environment=production
- name : OTEL_EXPORTER_OTLP_ENDPOINT
value : http://otel-collector.monitoring.svc.cluster.local:4318
securityContext :
privileged : false
volumeMounts :
- name : sys
mountPath : /sys
readOnly : true
volumes :
- name : sys
hostPath :
path : /sys
Metrics reference Full metrics list with types, units, and attributes
Configuration All environment variables and defaults