Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.openlit.io/llms.txt

Use this file to discover all available pages before exploring further.

In this guide you’ll pull the collector Docker image, point it at your OTel backend, and start seeing GPU and host metrics within minutes.

Prerequisites

  • Linux host with NVIDIA, AMD, or Intel GPU (for GPU metrics)
  • Docker installed
  • An OpenTelemetry-compatible backend (OpenLIT, Grafana, Datadog, or any OTLP endpoint)
docker run -d \
  --name openlit \
  -p 3000:3000 \
  -p 4318:4318 \
  ghcr.io/openlit/openlit:latest
Then use http://localhost:4318 as your OTEL_EXPORTER_OTLP_ENDPOINT.
1

Pull the collector image

docker pull ghcr.io/openlit/otel-gpu-collector:latest
2

Run the collector

docker run -d \
  --name otel-gpu-collector \
  --gpus all \
  -e OTEL_SERVICE_NAME=my-app \
  -e OTEL_RESOURCE_ATTRIBUTES='deployment.environment=production' \
  -e OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
  ghcr.io/openlit/otel-gpu-collector:latest
Requires the NVIDIA Container Toolkit on the host.
3

Verify it's running

docker logs otel-gpu-collector
You should see output like:
time=2024-01-01T00:00:00Z level=INFO msg="starting opentelemetry-gpu-collector"
time=2024-01-01T00:00:00Z level=INFO msg="discovered GPU" address=0000:01:00.0 vendor=nvidia
time=2024-01-01T00:00:00Z level=INFO msg="system metrics collector initialized"
time=2024-01-01T00:00:00Z level=INFO msg="process metrics collector initialized"
time=2024-01-01T00:00:00Z level=INFO msg="collector running"
4

View metrics in your backend

Open your OTel backend and look for metrics in the hw.gpu.*, system.*, and process.* namespaces.If using OpenLIT, navigate to http://localhost:3000 and go to the Metrics section.

Docker Compose

Add the collector as a service alongside your existing stack:
services:
  otel-gpu-collector:
    image: ghcr.io/openlit/otel-gpu-collector:latest
    environment:
      OTEL_SERVICE_NAME: my-app
      OTEL_RESOURCE_ATTRIBUTES: deployment.environment=production
      OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4318
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    depends_on:
      - otel-collector
    restart: always

Configuration

Full reference for all environment variables and defaults

Metrics reference

Complete list of all metrics, types, units, and attributes