Monitor NVIDIA and AMD GPUs with key metrics like usage, temperature, and power using OpenTelemetry for AI workloads
Parameter | Type | Default | Description |
---|---|---|---|
collect_system_metrics | boolean | False | Enable GPU and system metrics collection |
otlp_endpoint | string | None | OpenTelemetry OTLP endpoint URL |
otlp_headers | string | None | Authentication headers for OTLP endpoint |
service_name | string | "unknown_service" | Name of your AI application |
environment | string | None | Deployment environment (dev, staging, prod) |
Variable | Description | Example |
---|---|---|
OPENLIT_COLLECT_SYSTEM_METRICS | Enable GPU monitoring | true |
OTEL_EXPORTER_OTLP_ENDPOINT | OTLP endpoint URL | http://127.0.0.1:4318 |
OTEL_SERVICE_NAME | Service name for telemetry | my-gpu-app |
OTEL_DEPLOYMENT_ENVIRONMENT | Deployment environment | production |