Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.openlit.io/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The OpenLIT SDK provides server-side evaluations via openlit.eval() (Python) and openlit.eval() (JS/TS). Evaluations use the same engine, rules, contexts, and custom eval types configured in the OpenLIT dashboard — working identically for development (offline) and production (online) stages.

Quick Start

Run your first offline evaluation in 3 lines of code.

Batch Evaluation

Evaluate multiple prompt/response pairs concurrently.

Attributes & Rules

Auto-resolve OTel attributes for context-aware evaluations.

Offline Evaluations

Offline evaluations run on the OpenLIT server using the same evaluation engine as online/auto evaluations. The SDK sends your prompt and response to the server, which runs LLM-as-judge evaluation and returns structured results.

Prerequisites

  1. A running OpenLIT instance with evaluation configured in the dashboard.
  2. An OpenLIT API key (create one in the dashboard under Settings > API Keys).

Quick Start

import openlit

# Option 1: Configure once via init()
openlit.init(
    openlit_url="http://localhost:3000",
    openlit_api_key="openlit-xxxxx",
)

# Run evaluation
result = openlit.eval(
    prompt="What is the capital of France?",
    response="The capital of France is Lyon.",
    contexts=["Paris is the capital and largest city of France."],
)

# Use in assertions
assert result.passed, f"Evaluation failed: {result.failed_evals}"
# Option 2: Pass credentials directly (overrides init/env vars)
result = openlit.eval(
    prompt="Explain quantum computing",
    response="Quantum computers use qubits...",
    openlit_url="http://localhost:3000",
    openlit_api_key="openlit-xxxxx",
)
You can also configure via environment variables:
export OPENLIT_URL="http://localhost:3000"
export OPENLIT_API_KEY="openlit-xxxxx"

openlit.eval() / openlit.eval({}) Parameters

ParameterTypeDescriptionDefault
promptstrThe user prompt sent to the LLM. Required.
responsestrThe LLM’s response to evaluate. Required.
contextslist[str]Ground truth context for the evaluation.None
eval_typeslist[str]Specific eval types to run (e.g. ["hallucination", "toxicity"]). Runs all enabled types if omitted.None
attributesdictTrace attributes for rule engine matching (overrides auto-resolved attributes).None
threshold_scorefloatScore threshold for verdict determination.0.5
store_resultsboolWhether to store results in the OpenLIT database.True
run_idstrIdentifier to group related evaluations.None
metadatadictCustom key-value metadata to attach to results.None
openlit_api_keystrAPI key (overrides init() and env var).None
openlit_urlstrServer URL (overrides init() and env var).None
print_resultsboolPrint formatted summary to terminal.True

Result Object

openlit.eval() returns an OfflineEvalResult with these properties:
PropertyTypeDescription
successboolWhether the evaluation completed without errors.
passedboolTrue if no evaluation types returned a “yes” verdict.
evaluationslist[OfflineEvaluation]Individual evaluation results per type.
failed_evalslist[OfflineEvaluation]Evaluations that returned a “yes” verdict.
context_appliedContextInfoInformation about rule-matched context.
metadatadictModel, run ID, token usage, and cost metadata.
errorstrError message if success is False.
Each OfflineEvaluation contains:
FieldTypeDescription
typestrThe evaluation type (e.g. “hallucination”).
scorefloatThe evaluation score (0.0 to 1.0).
verdictstr”yes” if detected, “no” otherwise.
classificationstrCategory of the detection or “none”.
explanationstrBrief explanation of the evaluation result.

Selecting Evaluation Types

Run specific evaluation types instead of all enabled ones:
result = openlit.eval(
    prompt="Discuss workplace equality",
    response="Older workers can't learn new tech.",
    eval_types=["bias", "toxicity"],
)

Discover Available Types

types = openlit.get_eval_types()
for t in types:
    print(f"{t.id}: {t.label} (custom={t.is_custom}, enabled={t.enabled})")

Batch Evaluation

Evaluate multiple prompt/response pairs concurrently:
dataset = [
    {
        "prompt": "What is 2+2?",
        "response": "2+2 equals 4.",
        "contexts": ["Basic arithmetic."],
    },
    {
        "prompt": "Who wrote Hamlet?",
        "response": "Hamlet was written by Charles Dickens.",
    },
    {
        "prompt": "Describe gravity",
        "response": "Gravity is the force of attraction between masses.",
        "eval_types": ["hallucination"],
    },
]

batch_result = openlit.eval_batch(
    dataset=dataset,
    eval_types=["hallucination", "toxicity"],
    max_concurrent=5,
)

print(f"Pass rate: {batch_result.pass_rate:.0%}")
assert batch_result.all_passed

openlit.eval_batch() Parameters

ParameterTypeDescriptionDefault
datasetlist[dict]List of items with prompt and response keys. Required.
eval_typeslist[str]Default eval types (can be overridden per item).None
attributesdictDefault attributes (can be overridden per item).None
threshold_scorefloatDefault threshold score.0.5
store_resultsboolStore all results in the database.True
run_idstrGroup all batch evaluations under this ID. Auto-generated if omitted.None
max_concurrentintMaximum number of concurrent evaluations.5
print_resultsboolPrint aggregate summary to terminal.True

Automatic Attribute Resolution

The SDK automatically resolves trace attributes for rule engine matching, enabling context-aware evaluations without extra configuration. The resolution order (last wins):
  1. OTEL_RESOURCE_ATTRIBUTES environment variable
  2. OTEL_SERVICE_NAME environment variable
  3. OPENLIT_ENVIRONMENT / OTEL_DEPLOYMENT_ENVIRONMENT environment variable
  4. openlit.init() configuration (application_name, environment)
  5. Explicit attributes parameter (highest priority)
import openlit

# These are auto-detected for rule matching:
openlit.init(
    application_name="my-chatbot",
    environment="staging",
)

# Rules configured in the dashboard for service.name="my-chatbot"
# and deployment.environment="staging" will automatically match.
result = openlit.eval(
    prompt="Hello",
    response="Hi there!",
)

# Override auto-resolved attributes:
result = openlit.eval(
    prompt="Hello",
    response="Hi there!",
    attributes={
        "service.name": "different-service",
        "custom.tag": "experiment-v2",
    },
)

CI/CD Integration

Use offline evaluations in your test suite or CI pipeline:
import openlit
import pytest

def test_no_hallucination():
    result = openlit.eval(
        prompt="What year did WW2 end?",
        response="World War 2 ended in 1945.",
        eval_types=["hallucination"],
        print_results=False,
    )
    assert result.passed, f"Hallucination detected: {result.failed_evals}"

def test_batch_quality():
    dataset = load_test_cases()  # your test data
    result = openlit.eval_batch(
        dataset=dataset,
        print_results=False,
    )
    assert result.pass_rate >= 0.95, f"Pass rate too low: {result.pass_rate:.0%}"

Configuration Precedence

For openlit_api_key and openlit_url, the resolution order is:
  1. Explicit function parameter (highest priority)
  2. openlit.init() configuration
  3. OPENLIT_API_KEY / OPENLIT_URL environment variables

Deploy OpenLIT

Deployment options for scalable LLM monitoring infrastructure

Online Evaluations

Get started with evaluating your LLM responses in 2 simple steps on OpenLIT

Destinations

Send telemetry to Datadog, Grafana, New Relic, and other observability stacks

Zero-code observability with the OpenLIT Controller

Discover and instrument LLM traffic across Kubernetes, Docker, and Linux using eBPF — no code changes required.