Quickstart - OpenLIT

In this guide you’ll pull the collector Docker image, point it at your OTel backend, and start seeing GPU and host metrics within minutes.

Prerequisites

Linux host with NVIDIA, AMD, or Intel GPU (for GPU metrics)
Docker installed
An OpenTelemetry-compatible backend (OpenLIT, Grafana, Datadog, or any OTLP endpoint)

Don't have an OTel backend? Start OpenLIT locally

docker run -d \
  --name openlit \
  -p 3000:3000 \
  -p 4318:4318 \
  ghcr.io/openlit/openlit:latest

Then use http://localhost:4318 as your OTEL_EXPORTER_OTLP_ENDPOINT.

Pull the collector image

docker pull ghcr.io/openlit/otel-gpu-collector:latest

Run the collector

NVIDIA GPU
AMD GPU
Intel GPU
Host metrics only

docker run -d \
  --name otel-gpu-collector \
  --gpus all \
  -e OTEL_SERVICE_NAME=my-app \
  -e OTEL_RESOURCE_ATTRIBUTES='deployment.environment=production' \
  -e OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
  ghcr.io/openlit/otel-gpu-collector:latest

Requires the NVIDIA Container Toolkit on the host.

docker run -d \
  --name otel-gpu-collector \
  --device /dev/kfd:/dev/kfd \
  --device /dev/dri:/dev/dri \
  -e OTEL_SERVICE_NAME=my-app \
  -e OTEL_RESOURCE_ATTRIBUTES='deployment.environment=production' \
  -e OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
  ghcr.io/openlit/otel-gpu-collector:latest

docker run -d \
  --name otel-gpu-collector \
  --device /dev/dri:/dev/dri \
  -e OTEL_SERVICE_NAME=my-app \
  -e OTEL_RESOURCE_ATTRIBUTES='deployment.environment=production' \
  -e OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
  ghcr.io/openlit/otel-gpu-collector:latest

Requires Linux kernel 5.10+ with the i915 or Xe driver.

docker run -d \
  --name otel-gpu-collector \
  -e OTEL_SERVICE_NAME=my-app \
  -e OTEL_RESOURCE_ATTRIBUTES='deployment.environment=production' \
  -e OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
  ghcr.io/openlit/otel-gpu-collector:latest

The collector will export host and process metrics even without GPU access.

Verify it's running

docker logs otel-gpu-collector

You should see output like:

time=2024-01-01T00:00:00Z level=INFO msg="starting opentelemetry-gpu-collector"
time=2024-01-01T00:00:00Z level=INFO msg="discovered GPU" address=0000:01:00.0 vendor=nvidia
time=2024-01-01T00:00:00Z level=INFO msg="system metrics collector initialized"
time=2024-01-01T00:00:00Z level=INFO msg="process metrics collector initialized"
time=2024-01-01T00:00:00Z level=INFO msg="collector running"

View metrics in your backend

Open your OTel backend and look for metrics in the hw.gpu.*, system.*, and process.* namespaces.If using OpenLIT, navigate to http://localhost:3000 and go to the Metrics section.

Docker Compose

Add the collector as a service alongside your existing stack:

services:
  otel-gpu-collector:
    image: ghcr.io/openlit/otel-gpu-collector:latest
    environment:
      OTEL_SERVICE_NAME: my-app
      OTEL_RESOURCE_ATTRIBUTES: deployment.environment=production
      OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4318
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    depends_on:
      - otel-collector
    restart: always

Configuration

Full reference for all environment variables and defaults

Metrics reference

Complete list of all metrics, types, units, and attributes

Documentation Index

​Prerequisites