Monitor AMD GPUs using OpenTelemetry

OpenLIT uses OpenTelemetry to help you monitor AMD GPUs. This includes tracking GPU metrics like utilization, temperature, memory usage and power consumption.

Get Started

Using the SDK

Collect and send GPU performance metrics directly from your application to an OpenTelemetry endpoint.

Using the Collector

Install the OpenTelemetry GPU Collector as a Docker container to collect and send GPU performance metrics to an OpenTelemetry endpoint.

Using the SDK

Install OpenLIT

Open your command line or terminal and run:

pip install openlit

Initialize OpenLIT in your Application

You can set up OpenLIT in your application using either function arguments directly in your code or by using environment variables.

Add the following two lines to your application code:

import openlit

openlit.init(
  otlp_endpoint="YOUR_OTEL_ENDPOINT", otlp_headers="YOUR_OTEL_HEADERS"
  collect_gpu_stats=True 
)

Replace:

YOUR_OTEL_ENDPOINT with the URL of your OpenTelemetry backend, such as http://127.0.0.1:4318 if you are using OpenLIT and a local OTel Collector.

To send metrics to other Observability tools, refer to the Connections Guide.For more advanced configurations and application use cases, visit the OpenLIT Python repository.

Using the Collector

Pull `otel-gpu-collector` Docker Image

You can quickly start using the OTel GPU Collector by pulling the Docker image:

docker pull ghcr.io/openlit/otel-gpu-collector:latest

Run `otel-gpu-collector` Docker container

You can quickly start using the OTel GPU Collector by pulling the Docker image: Here’s a quick example showing how to run the container with the required environment variables:

docker run --gpus all \
    -e GPU_APPLICATION_NAME='chatbot' \
    -e GPU_ENVIRONMENT='staging' \
    -e OTEL_EXPORTER_OTLP_ENDPOINT="YOUR_OTEL_ENDPOINT" \
    -e OTEL_EXPORTER_OTLP_HEADERS="YOUR_OTEL_HEADERS" \
    ghcr.io/openlit/otel-gpu-collector:latest

For more advanced configurations of the collector, visit the OTel GPU Collector repository.Note: If you’ve deployed OpenLIT using Docker Compose, make sure to use the host’s IP address or add OTel GPU Collector to the Docker Compose:

Docker Compose: Add the following config under `services`

otel-gpu-collector:
  image: ghcr.io/openlit/otel-gpu-collector:latest
  environment:
    GPU_APPLICATION_NAME: 'chatbot'
    GPU_ENVIRONMENT: 'staging'
    OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector:4318"
  device_requests:
  - driver: nvidia
    count: all
    capabilities: [gpu]
  depends_on:
  - otel-collector
  restart: always

Host IP: Use the Host IP to connect to OTel Collector

Environment Variables

OTel GPU Collector supports several environment variables for configuration. Below is a table that describes each variable:

Environment Variable	Description	Default Value
`GPU_APPLICATION_NAME`	Name of the application running on the GPU	`default_app`
`GPU_ENVIRONMENT`	Environment name (e.g., staging, production)	`production`
`OTEL_EXPORTER_OTLP_ENDPOINT`	OpenTelemetry OTLP endpoint URL	(required)
`OTEL_EXPORTER_OTLP_HEADERS`	Headers for authenticating with the OTLP endpoint	Ignore if using OpenLIT

Collected Metrics

Details on the types of metrics collected and their descriptions.

Connections

Connect to your existing Observablity Stack

SDK configuration

Documentation of the configuration options for the OpenLIT SDK.

Introduction

Features

Integrations

Connections

API Reference

Privacy

Monitor AMD GPUs using OpenTelemetry

Get Started

Using the SDK

Using the Collector

Environment Variables

Collected Metrics

Connections

SDK configuration

Introduction

Features

Integrations

Connections

API Reference

Privacy

​Get Started

Using the SDK

Using the Collector

Collected Metrics

Connections

SDK configuration

Get Started