Monitor AMD GPUs using OpenTelemetry
OpenLIT uses OpenTelemetry to help you monitor AMD GPUs. This includes tracking GPU metrics like utilization, temperature, memory usage and power consumption.
Get Started
Using the SDK
Collect and send GPU performance metrics directly from your application to an OpenTelemetry endpoint.
Using the Collector
Install the OpenTelemetry GPU Collector as a Docker container to collect and send GPU performance metrics to an OpenTelemetry endpoint.
Using the SDK
Using the SDK
Install OpenLIT
Open your command line or terminal and run:
Initialize OpenLIT in your Application
You can set up OpenLIT in your application using either function arguments directly in your code or by using environment variables.
Add the following two lines to your application code:
Replace:
YOUR_OTEL_ENDPOINT
with the URL of your OpenTelemetry backend, such ashttp://127.0.0.1:4318
if you are using OpenLIT and a local OTel Collector.
Add the following two lines to your application code:
Replace:
YOUR_OTEL_ENDPOINT
with the URL of your OpenTelemetry backend, such ashttp://127.0.0.1:4318
if you are using OpenLIT and a local OTel Collector.
Add the following two lines to your application code:
Then, configure the your OTLP endpoint using environment variable:
Replace:
YOUR_OTEL_ENDPOINT
with the URL of your OpenTelemetry backend, such ashttp://127.0.0.1:4318
if you are using OpenLIT and a local OTel Collector.
To send metrics to other Observability tools, refer to the Connections Guide.
For more advanced configurations and application use cases, visit the OpenLIT Python repository.
Using the Collector
Using the Collector
Pull `otel-gpu-collector` Docker Image
You can quickly start using the OTel GPU Collector by pulling the Docker image:
Run `otel-gpu-collector` Docker container
You can quickly start using the OTel GPU Collector by pulling the Docker image: Here’s a quick example showing how to run the container with the required environment variables:
For more advanced configurations of the collector, visit the OTel GPU Collector repository.
Note: If you’ve deployed OpenLIT using Docker Compose, make sure to use the host’s IP address or add OTel GPU Collector to the Docker Compose:
Docker Compose: Add the following config under `services`
Docker Compose: Add the following config under `services`
Host IP: Use the Host IP to connect to OTel Collector
Host IP: Use the Host IP to connect to OTel Collector
Environment Variables
OTel GPU Collector supports several environment variables for configuration. Below is a table that describes each variable:
Environment Variable | Description | Default Value |
---|---|---|
GPU_APPLICATION_NAME | Name of the application running on the GPU | default_app |
GPU_ENVIRONMENT | Environment name (e.g., staging, production) | production |
OTEL_EXPORTER_OTLP_ENDPOINT | OpenTelemetry OTLP endpoint URL | (required) |
OTEL_EXPORTER_OTLP_HEADERS | Headers for authenticating with the OTLP endpoint | Ignore if using OpenLIT |
Collected Metrics
Details on the types of metrics collected and their descriptions.