Monitor NVIDIA GPUs with key metrics like usage, temperature, and power using OpenTelemetry
OpenLIT uses OpenTelemetry to help you monitor NVIDIA GPUs. This includes tracking GPU metrics like utilization, temperature, memory usage and power consumption.
Collect and send GPU performance metrics directly from your application to an OpenTelemetry endpoint.
Install the OpenTelemetry GPU Collector as a Docker container to collect and send GPU performance metrics to an OpenTelemetry endpoint.
Using the SDK
Install OpenLIT
Open your command line or terminal and run:
Initialize OpenLIT in your Application
You can set up OpenLIT in your application using either function arguments directly in your code or by using environment variables.
Add the following two lines to your application code:
Replace:
YOUR_OTEL_ENDPOINT
with the URL of your OpenTelemetry backend, such as http://127.0.0.1:4318
if you are using OpenLIT and a local OTel Collector.Add the following two lines to your application code:
Replace:
YOUR_OTEL_ENDPOINT
with the URL of your OpenTelemetry backend, such as http://127.0.0.1:4318
if you are using OpenLIT and a local OTel Collector.Add the following two lines to your application code:
Then, configure the your OTLP endpoint using environment variable:
Replace:
YOUR_OTEL_ENDPOINT
with the URL of your OpenTelemetry backend, such as http://127.0.0.1:4318
if you are using OpenLIT and a local OTel Collector.To send metrics to other Observability tools, refer to the Connections Guide.
For more advanced configurations and application use cases, visit the OpenLIT Python repository.
Using the Collector
Pull `otel-gpu-collector` Docker Image
You can quickly start using the OTel GPU Collector by pulling the Docker image:
Run `otel-gpu-collector` Docker container
You can quickly start using the OTel GPU Collector by pulling the Docker image: Here’s a quick example showing how to run the container with the required environment variables:
For more advanced configurations of the collector, visit the OTel GPU Collector repository.
Note: If you’ve deployed OpenLIT using Docker Compose, make sure to use the host’s IP address or add OTel GPU Collector to the Docker Compose:
Docker Compose: Add the following config under `services`
Host IP: Use the Host IP to connect to OTel Collector
OTel GPU Collector supports several environment variables for configuration. Below is a table that describes each variable:
Environment Variable | Description | Default Value |
---|---|---|
GPU_APPLICATION_NAME | Name of the application running on the GPU | default_app |
GPU_ENVIRONMENT | Environment name (e.g., staging, production) | production |
OTEL_EXPORTER_OTLP_ENDPOINT | OpenTelemetry OTLP endpoint URL | (required) |
OTEL_EXPORTER_OTLP_HEADERS | Headers for authenticating with the OTLP endpoint | Ignore if using OpenLIT |
Details on the types of metrics collected and their descriptions.
Connect to your existing Observablity Stack
Documentation of the configuration options for the OpenLIT SDK.
Monitor NVIDIA GPUs with key metrics like usage, temperature, and power using OpenTelemetry
OpenLIT uses OpenTelemetry to help you monitor NVIDIA GPUs. This includes tracking GPU metrics like utilization, temperature, memory usage and power consumption.
Collect and send GPU performance metrics directly from your application to an OpenTelemetry endpoint.
Install the OpenTelemetry GPU Collector as a Docker container to collect and send GPU performance metrics to an OpenTelemetry endpoint.
Using the SDK
Install OpenLIT
Open your command line or terminal and run:
Initialize OpenLIT in your Application
You can set up OpenLIT in your application using either function arguments directly in your code or by using environment variables.
Add the following two lines to your application code:
Replace:
YOUR_OTEL_ENDPOINT
with the URL of your OpenTelemetry backend, such as http://127.0.0.1:4318
if you are using OpenLIT and a local OTel Collector.Add the following two lines to your application code:
Replace:
YOUR_OTEL_ENDPOINT
with the URL of your OpenTelemetry backend, such as http://127.0.0.1:4318
if you are using OpenLIT and a local OTel Collector.Add the following two lines to your application code:
Then, configure the your OTLP endpoint using environment variable:
Replace:
YOUR_OTEL_ENDPOINT
with the URL of your OpenTelemetry backend, such as http://127.0.0.1:4318
if you are using OpenLIT and a local OTel Collector.To send metrics to other Observability tools, refer to the Connections Guide.
For more advanced configurations and application use cases, visit the OpenLIT Python repository.
Using the Collector
Pull `otel-gpu-collector` Docker Image
You can quickly start using the OTel GPU Collector by pulling the Docker image:
Run `otel-gpu-collector` Docker container
You can quickly start using the OTel GPU Collector by pulling the Docker image: Here’s a quick example showing how to run the container with the required environment variables:
For more advanced configurations of the collector, visit the OTel GPU Collector repository.
Note: If you’ve deployed OpenLIT using Docker Compose, make sure to use the host’s IP address or add OTel GPU Collector to the Docker Compose:
Docker Compose: Add the following config under `services`
Host IP: Use the Host IP to connect to OTel Collector
OTel GPU Collector supports several environment variables for configuration. Below is a table that describes each variable:
Environment Variable | Description | Default Value |
---|---|---|
GPU_APPLICATION_NAME | Name of the application running on the GPU | default_app |
GPU_ENVIRONMENT | Environment name (e.g., staging, production) | production |
OTEL_EXPORTER_OTLP_ENDPOINT | OpenTelemetry OTLP endpoint URL | (required) |
OTEL_EXPORTER_OTLP_HEADERS | Headers for authenticating with the OTLP endpoint | Ignore if using OpenLIT |
Details on the types of metrics collected and their descriptions.
Connect to your existing Observablity Stack
Documentation of the configuration options for the OpenLIT SDK.