LLM Application Metrics
OpenLIT offers automatic instrumentation with OpenTelemetry for various LLM providers, frameworks, and VectorDBs, enabling you to gain valuable insights into the behavior and performance of your LLM applications through metrics.
This documentation covers metrics settings, understanding semantic conventions, and interpreting metric attributes, empowering you to enhance the monitoring and observability of your LLM applications.
Quickstart
Get Started with monitoring your LLM Applications in 2 simple steps
Connections
Connect to your existing Observablity Stack
Disable Metrics
You have the option to disable the collection of metrics if needed. By default, metrics collection is enabled.
Example:
# Disable metrics collection
openlit.init(disable_metrics=True)
Using an existing OTel Metrics instance
You have the flexibility to integrate your existing OpenTelemetry (OTel) Metrics instance configuration with OpenLIT.
If you already have an OTel Metrics instance instantiated in your application, you can pass it directly to openlit.init(meter=meter)
.
This integration ensures that OpenLIT utilizes your custom OTel metrics instance settings, allowing for a unified metrics setup across your application.
Example:
# Instantiate an OpenTelemetry Metrics meter
meter = ...
# Pass the meter to OpenLIT
openlit.init(meter=meter)
Add custom resource attributes
The OTEL_RESOURCE_ATTRIBUTES
environment variable allows you to provide additional OpenTelemetry resource attributes when starting your application with OpenLIT. OpenLIT already includes some default resource attributes:
telemetry.sdk.name: openlit
service.name: YOUR_SERVICE_NAME
deployment.environment: YOUR_ENVIRONMENT_NAME
You can enhance these default resource attributes by adding your own using the OTEL_RESOURCE_ATTRIBUTES
variable. Your custom attributes will be added on top of the existing OpenLIT attributes, providing additional context to your telemetry data. Simply format your attributes as key1=value1,key2=value2
.
For example:
export OTEL_RESOURCE_ATTRIBUTES="service.instance.id=YOUR_SERVICE_ID,k8s.pod.name=K8S_POD_NAME,k8s.namespace.name=K8S_NAMESPACE,k8s.node.name=K8S_NODE_NAME"
Semantic Convention
This section outlines the OpenTelemetry metrics collected by OpenLIT from applications using LLMs and Vector Databases. These metrics offer a straightforward overview of application performance and resource usage. They serve as a supplement to the detailed data captured through tracing, aiding in the easy creation of dashboards for quick monitoring of system usage and performance.
GenAI/LLM Metrics
Metric Name | Description | Unit | Type | Attributes |
---|---|---|---|---|
gen_ai.total.requests | Number of requests to the LLM. | 1 | Counter | telemetry.sdk.name , gen_ai.application_name , gen_ai.system , gen_ai.environment , gen_ai.operation.name , gen_ai.request.model |
gen_ai.usage.input_tokens | Number of input tokens processed. | 1 | Counter | telemetry.sdk.name , gen_ai.application_name , gen_ai.system , gen_ai.environment , gen_ai.operation.name , gen_ai.request.model |
gen_ai.usage.output_tokens | Number of output tokens processed. | 1 | Counter | telemetry.sdk.name , gen_ai.application_name , gen_ai.system , gen_ai.environment , gen_ai.operation.name , gen_ai.request.model |
gen_ai.usage.total_tokens | Total number of tokens processed. | 1 | Counter | telemetry.sdk.name , gen_ai.application_name , gen_ai.system , gen_ai.environment , gen_ai.operation.name , gen_ai.request.model |
gen_ai.usage.cost | The cost distribution of LLM requests. | USD | Histogram | telemetry.sdk.name , gen_ai.application_name , gen_ai.system , gen_ai.environment , gen_ai.operation.name , gen_ai.request.model |
VectorDB Metrics
Metric Name | Description | Unit | Type | Attributes |
---|---|---|---|---|
db.total.requests | Number of requests to VectorDBs. | 1 | Counter | telemetry.sdk.name , gen_ai.application_name , gen_ai.environment |
GPU Metrics
Metric Name | Description | Unit | Type | Attributes |
---|---|---|---|---|
gpu.utilization | GPU Utilization in percentage | percent | Gauge | telemetry.sdk.name , gen_ai.application_name , gen_ai.environment , gpu_index , gpu_name , gpu_uuid |
gpu.enc.utilization | GPU encoder Utilization in percentage | percent | Gauge | telemetry.sdk.name , gen_ai.application_name , gen_ai.environment , gpu_index , gpu_name , gpu_uuid |
gpu.dec.utilization | GPU decoder Utilization in percentage | percent | Gauge | telemetry.sdk.name , gen_ai.application_name , gen_ai.environment , gpu_index , gpu_name , gpu_uuid |
gpu.temperature | GPU Temperature in Celsius | Celsius | Gauge | telemetry.sdk.name , gen_ai.application_name , gen_ai.environment , gpu_index , gpu_name , gpu_uuid |
gpu.fan_speed | GPU Fan Speed (0-100) as an integer | Integer | Gauge | telemetry.sdk.name , gen_ai.application_name , gen_ai.environment , gpu_index , gpu_name , gpu_uuid |
gpu.memory.available | Available GPU Memory in MB | MB | Gauge | telemetry.sdk.name , gen_ai.application_name , gen_ai.environment , gpu_index , gpu_name , gpu_uuid |
gpu.memory.total | Total GPU Memory in MB | MB | Gauge | telemetry.sdk.name , gen_ai.application_name , gen_ai.environment , gpu_index , gpu_name , gpu_uuid |
gpu.memory.used | Used GPU Memory in MB | MB | Gauge | telemetry.sdk.name , gen_ai.application_name , gen_ai.environment , gpu_index , gpu_name , gpu_uuid |
gpu.memory.free | Free GPU Memory in MB | MB | Gauge | telemetry.sdk.name , gen_ai.application_name , gen_ai.environment , gpu_index , gpu_name , gpu_uuid |
gpu.power.draw | GPU Power Draw in Watts | Watt | Gauge | telemetry.sdk.name , gen_ai.application_name , gen_ai.environment , gpu_index , gpu_name , gpu_uuid |
gpu.power.limit | GPU Power Limit in Watts | Watt | Gauge | telemetry.sdk.name , gen_ai.application_name , gen_ai.environment , gpu_index , gpu_name , gpu_uuid |
Was this page helpful?