Skip to main content
In this guide you’ll pull the collector Docker image, point it at your OTel backend, and start seeing GPU and host metrics within minutes.

Prerequisites

  • Linux host with NVIDIA, AMD, or Intel GPU (for GPU metrics)
  • Docker installed
  • An OpenTelemetry-compatible backend (OpenLIT, Grafana, Datadog, or any OTLP endpoint)
docker run -d \
  --name openlit \
  -p 3000:3000 \
  -p 4318:4318 \
  ghcr.io/openlit/openlit:latest
Then use http://localhost:4318 as your OTEL_EXPORTER_OTLP_ENDPOINT.
1

Pull the collector image

docker pull ghcr.io/openlit/otel-gpu-collector:latest
2

Run the collector

docker run -d \
  --name otel-gpu-collector \
  --gpus all \
  -e OTEL_SERVICE_NAME=my-app \
  -e OTEL_RESOURCE_ATTRIBUTES='deployment.environment=production' \
  -e OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
  ghcr.io/openlit/otel-gpu-collector:latest
Requires the NVIDIA Container Toolkit on the host.
3

Verify it's running

docker logs otel-gpu-collector
You should see output like:
time=2024-01-01T00:00:00Z level=INFO msg="starting opentelemetry-gpu-collector"
time=2024-01-01T00:00:00Z level=INFO msg="discovered GPU" address=0000:01:00.0 vendor=nvidia
time=2024-01-01T00:00:00Z level=INFO msg="system metrics collector initialized"
time=2024-01-01T00:00:00Z level=INFO msg="process metrics collector initialized"
time=2024-01-01T00:00:00Z level=INFO msg="collector running"
4

View metrics in your backend

Open your OTel backend and look for metrics in the hw.gpu.*, system.*, and process.* namespaces.If using OpenLIT, navigate to http://localhost:3000 and go to the Metrics section.

Docker Compose

Add the collector as a service alongside your existing stack:
services:
  otel-gpu-collector:
    image: ghcr.io/openlit/otel-gpu-collector:latest
    environment:
      OTEL_SERVICE_NAME: my-app
      OTEL_RESOURCE_ATTRIBUTES: deployment.environment=production
      OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4318
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    depends_on:
      - otel-collector
    restart: always

Configuration

Full reference for all environment variables and defaults

Metrics reference

Complete list of all metrics, types, units, and attributes