Overview
The OpenLIT SDK provides server-side evaluations viaopenlit.eval() (Python) and openlit.eval() (JS/TS). Evaluations use the same engine, rules, contexts, and custom eval types configured in the OpenLIT dashboard — working identically for development (offline) and production (online) stages.
Quick Start
Run your first offline evaluation in 3 lines of code.
Batch Evaluation
Evaluate multiple prompt/response pairs concurrently.
Attributes & Rules
Auto-resolve OTel attributes for context-aware evaluations.
Offline Evaluations
Offline evaluations run on the OpenLIT server using the same evaluation engine as online/auto evaluations. The SDK sends your prompt and response to the server, which runs LLM-as-judge evaluation and returns structured results.Prerequisites
- A running OpenLIT instance with evaluation configured in the dashboard.
- An OpenLIT API key (create one in the dashboard under Settings > API Keys).
Quick Start
- Python
- TypeScript / JavaScript
openlit.eval() / openlit.eval({}) Parameters
- Python
- TypeScript / JavaScript
| Parameter | Type | Description | Default |
|---|---|---|---|
prompt | str | The user prompt sent to the LLM. Required. | — |
response | str | The LLM’s response to evaluate. Required. | — |
contexts | list[str] | Ground truth context for the evaluation. | None |
eval_types | list[str] | Specific eval types to run (e.g. ["hallucination", "toxicity"]). Runs all enabled types if omitted. | None |
attributes | dict | Trace attributes for rule engine matching (overrides auto-resolved attributes). | None |
threshold_score | float | Score threshold for verdict determination. | 0.5 |
store_results | bool | Whether to store results in the OpenLIT database. | True |
run_id | str | Identifier to group related evaluations. | None |
metadata | dict | Custom key-value metadata to attach to results. | None |
openlit_api_key | str | API key (overrides init() and env var). | None |
openlit_url | str | Server URL (overrides init() and env var). | None |
print_results | bool | Print formatted summary to terminal. | True |
Result Object
openlit.eval() returns an OfflineEvalResult with these properties:
| Property | Type | Description |
|---|---|---|
success | bool | Whether the evaluation completed without errors. |
passed | bool | True if no evaluation types returned a “yes” verdict. |
evaluations | list[OfflineEvaluation] | Individual evaluation results per type. |
failed_evals | list[OfflineEvaluation] | Evaluations that returned a “yes” verdict. |
context_applied | ContextInfo | Information about rule-matched context. |
metadata | dict | Model, run ID, token usage, and cost metadata. |
error | str | Error message if success is False. |
OfflineEvaluation contains:
| Field | Type | Description |
|---|---|---|
type | str | The evaluation type (e.g. “hallucination”). |
score | float | The evaluation score (0.0 to 1.0). |
verdict | str | ”yes” if detected, “no” otherwise. |
classification | str | Category of the detection or “none”. |
explanation | str | Brief explanation of the evaluation result. |
Selecting Evaluation Types
Run specific evaluation types instead of all enabled ones:- Python
- TypeScript / JavaScript
Discover Available Types
- Python
- TypeScript / JavaScript
Batch Evaluation
Evaluate multiple prompt/response pairs concurrently:- Python
- TypeScript / JavaScript
openlit.eval_batch() Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
dataset | list[dict] | List of items with prompt and response keys. Required. | — |
eval_types | list[str] | Default eval types (can be overridden per item). | None |
attributes | dict | Default attributes (can be overridden per item). | None |
threshold_score | float | Default threshold score. | 0.5 |
store_results | bool | Store all results in the database. | True |
run_id | str | Group all batch evaluations under this ID. Auto-generated if omitted. | None |
max_concurrent | int | Maximum number of concurrent evaluations. | 5 |
print_results | bool | Print aggregate summary to terminal. | True |
Automatic Attribute Resolution
The SDK automatically resolves trace attributes for rule engine matching, enabling context-aware evaluations without extra configuration. The resolution order (last wins):OTEL_RESOURCE_ATTRIBUTESenvironment variableOTEL_SERVICE_NAMEenvironment variableOPENLIT_ENVIRONMENT/OTEL_DEPLOYMENT_ENVIRONMENTenvironment variableopenlit.init()configuration (application_name,environment)- Explicit
attributesparameter (highest priority)
- Python
- TypeScript / JavaScript
CI/CD Integration
Use offline evaluations in your test suite or CI pipeline:- Python
- TypeScript / JavaScript
Configuration Precedence
Foropenlit_api_key and openlit_url, the resolution order is:
- Explicit function parameter (highest priority)
openlit.init()configurationOPENLIT_API_KEY/OPENLIT_URLenvironment variables
Deploy OpenLIT
Deployment options for scalable LLM monitoring infrastructure
Online Evaluations
Get started with evaluating your LLM responses in 2 simple steps on OpenLIT
Destinations
Send telemetry to Datadog, Grafana, New Relic, and other observability stacks
Zero-code observability with the OpenLIT Controller
Discover and instrument LLM traffic across Kubernetes, Docker, and Linux using eBPF — no code changes required.

