This guide will help you set up evaluations to assess your model’s responses. With OpenLIT, you can evaluate and score text for Hallucination, Bias, and Toxicity.

We’ll demonstrate how to use the All evaluator, which checks for all three aspects in one go, using openlit.evals. Additionally, we’ll show you how to collect OpenTelemetry metrics during the evaluation process.

1

Initialize evaluations in Your Application

Add the following two lines to your application code:

import openlit

# The 'All' evaluator runs checks for Hallucination, Bias, and Toxicity
evals = openlit.evals.All()
result = evals.measure()

Full Example:

import openlit

# The 'All' evaluator runs checks for Hallucination, Bias, and Toxicity
evals = openlit.evals.All()

contexts = ["Einstein won the Nobel Prize for his discovery of the photoelectric effect in 1921"]
prompt = "When and why did Einstein win the Nobel Prize?"

text = "Einstein won the Nobel Prize in 1969 for his discovery of the photoelectric effect"

result = evals.measure(prompt=prompt, contexts=contexts, text=text)
Output
verdict='yes' evaluation='hallucination' score=0.9 classification='factual_inaccuracy' explanation='The text incorrectly states that Einstein won the Nobel Prize in 1969, while the context specifies that he won it in 1921 for his discovery of the photoelectric effect, leading to a significant factual inconsistency.'

The “All” evaluator is great for checking text for Hallucination, Bias, and Toxicity simultaneously. For more efficient, targeted evaluations, you can use specific evaluators like openlit.evals.Hallucination(), openlit.evals.Bias(), or openlit.evals.Toxicity().

For details on how it works, and to see the supported providers, models and parameters to pass, check our Evaluations Guide.

2

Collecting Evaluation metrics

The openlit.evals module integrates with OpenTelemetry to track evaluation metrics as a counter, including score details and evaluation metadata. To enable metric collection, initialize OpenLIT with metrics tracking:

import openlit

# Initialize OpenLIT for metrics collection
openlit.init()

# Then, initialize the evaluator with metric tracking enabled
evals = openlit.evals.All(collect_metrics=True)

These metrics can be sent to any OpenTelemetry-compatible backend. For configuration details, check out our Connections Guide to choose your preferred destination for these metrics.

You’re all set! By following these steps, you can effectively evaluate the text generated by your models.

If you have any questions or need support, reach out to our community.