> ## Documentation Index
> Fetch the complete documentation index at: https://docs.openlit.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Programmatic Evaluations

> Quickly evaluate your LLMs and AI Agent responses for Hallucination, Bias, and Toxicity

This guide demonstrates how to implement LLM evaluation tools to assess model output quality. With OpenLIT's programmatic evaluations, you can perform LLM hallucination detection, bias detection, and toxicity filtering using production-ready evaluation metrics.

Learn how to use our `All` evaluator for comprehensive LLM output assessment, measuring hallucination, bias, and toxicity simultaneously. We'll also show you how to collect OpenTelemetry evaluation metrics for continuous model performance monitoring.

<Steps>
  <Step title="Initialize evaluations">
    Set up evaluations for large language models with just two lines of code:

    <Tabs>
      <Tab title="Python">
        ```python theme={null}
        import openlit

        # Comprehensive LLM evaluation: hallucination detection, bias detection, toxicity filtering
        evals = openlit.evals.All()
        result = evals.measure()
        ```

        Full Example:

        ```python example.py theme={null}
        import openlit

        # openlit can also read the OPENAI_API_KEY variable directy from env if not specified via function argument
        openai_api_key=os.getenv("OPENAI_API_KEY")

        # Production-ready LLM evaluation tools for hallucination detection and bias screening
        evals = openlit.evals.All(provider="openai", api_key=openai_api_key)

        contexts = ["Einstein won the Nobel Prize for his discovery of the photoelectric effect in 1921"]
        prompt = "When and why did Einstein win the Nobel Prize?"
        text = "Einstein won the Nobel Prize in 1969 for his discovery of the photoelectric effect"

        result = evals.measure(prompt=prompt, contexts=contexts, text=text)
        ```

        ```sh Output theme={null}
        verdict='yes' evaluation='Hallucination' score=0.9 classification='factual_inaccuracy' explanation='The text incorrectly states that Einstein won the Nobel Prize in 1969, while the context specifies that he won it in 1921 for his discovery of the photoelectric effect, leading to a significant factual inconsistency.'
        ```
      </Tab>

      <Tab title="Typescript">
        ```typescript theme={null}
        import openlit from "openlit"

        // Comprehensive LLM evaluation: hallucination detection, bias detection, toxicity filtering
        const evals = new openlit.evals.All()
        const result = await evals.measure()
        ```

        Full Example:

        ```typescript theme={null}
        import openlit from "openlit"

        // Production-ready LLM evaluation tools for hallucination detection and bias screening
        const evals = new openlit.evals.All({
            provider: "openai",
            apiKey: process.env.OPENAI_API_KEY,
        })

        const contexts = ["Einstein won the Nobel Prize for his discovery of the photoelectric effect in 1921"];
        const prompt = "When and why did Einstein win the Nobel Prize?";
        const text = "Einstein won the Nobel Prize in 1969 for his discovery of the photoelectric effect";

        const result = await evals.measure({ prompt, contexts, text });
        console.log(result)
        ```
      </Tab>
    </Tabs>

    The `All` evaluator assesses model outputs for hallucination detection, bias detection, and toxicity filtering simultaneously. For targeted model evaluation, use specific evaluators:

    <CardGroup cols={3}>
      <Card title="Hallucination detection" href="/latest/sdk/features/evaluations#hallucination-detection" icon="circle-exclamation">
        Detect factual inaccuracies and false information in LLM responses
      </Card>

      <Card title="Bias detection" href="/latest/sdk/features/evaluations#bias-detection" icon="people-arrows">
        Identify potential biases and unfair representations in AI outputs
      </Card>

      <Card title="Toxicity filtering" href="/latest/sdk/features/evaluations#toxicity-detection" icon="shield-virus">
        Screen content for harmful, offensive, or inappropriate material
      </Card>
    </CardGroup>

    For advanced LLM evaluation metrics and supported providers, explore our [Evaluations Guide](/latest/sdk/features/evaluations).
  </Step>

  <Step title="Track LLM evaluation metrics">
    To send evaluation scores to OpenTelemetry backends, your application needs to be instrumented via OpenLIT. Choose from three instrumentation methods, then simply add `collect_metrics=True` to track hallucination detection, bias screening, and toxicity filtering metrics.

    <Tabs>
      <Tab title="Zero-Code instrumentation">
        No code changes needed - instrument via CLI:

        ```bash theme={null}
        # Run with zero-code instrumentation
        openlit-instrument python your_app.py
        ```

        Then in your application:

        ```python theme={null}
        import openlit

        # Enable evaluation metrics tracking - OpenLIT instrumentation handles the rest
        evals = openlit.evals.All(collect_metrics=True)
        result = evals.measure(prompt=prompt, contexts=contexts, text=text)
        ```
      </Tab>

      <Tab title="Manual instrumentation">
        Add OpenLIT initialization to your application:

        ```python theme={null}
        import openlit

        # Initialize OpenLIT for LLM evaluation metrics collection
        openlit.init()

        # Enable evaluation metric tracking for hallucination detection and bias screening
        evals = openlit.evals.All(collect_metrics=True)
        result = evals.measure(prompt=prompt, contexts=contexts, text=text)
        ```

        TypeScript example:

        ```typescript theme={null}
        import openlit from "openlit"

        // Initialize OpenLIT instrumentation
        openlit.init()

        // Automatic LLM evaluation metrics collection
        const evals = new openlit.evals.All({ collectMetrics: true });
        const result = await evals.measure({ prompt, contexts, text });
        ```
      </Tab>
    </Tabs>

    Metrics are sent to the same OpenTelemetry backend conifgured during instrumentation, check our [support destinations](/latest/sdk/destinations/overview) for configuration details.
  </Step>
</Steps>

You're all set! Your AI applications now have complete model evaluation capabilities with automated hallucination detection, bias screening, and toxicity filtering. Monitor LLM output quality with real-time evaluation metrics.

If you have any questions or need support, reach out to our [community](https://join.slack.com/t/openlit/shared_invite/zt-2etnfttwg-TjP_7BZXfYg84oAukY8QRQ).

***

<CardGroup cols={3}>
  <Card title="LLM-as-a-judge" href="/latest/openlit/evaluations/llm-as-a-judge" icon="gavel">
    Automatically add evaluation scoring to production traces
  </Card>

  <Card title="Integrations" href="/latest/sdk/integrations/overview" icon="circle-nodes">
    60+ AI integrations with automatic instrumentation and performance tracking
  </Card>

  <Card title="Destinations" href="/latest/sdk/destinations/overview" icon="link">
    Send elemetry to Datadog, Grafana, New Relic, and other observability stacks
  </Card>
</CardGroup>

<Card title="Zero-code observability with the OpenLIT Controller" icon="tower-broadcast" href="/latest/controller/overview">
  Discover and instrument LLM traffic across Kubernetes, Docker, and Linux using eBPF — no code changes required.
</Card>
