Get started with Evaluations
Quickly evaluate your model responses for Hallucination, Bias, and Toxicity
This guide will help you set up evaluations to assess your model’s responses. With OpenLIT, you can evaluate and score text for Hallucination, Bias, and Toxicity.
We’ll demonstrate how to use the All
evaluator, which checks for all three aspects in one go, using openlit.evals
. Additionally, we’ll show you how to collect OpenTelemetry metrics during the evaluation process.
Initialize evaluations in Your Application
Add the following two lines to your application code:
Full Example:
The “All” evaluator is great for checking text for Hallucination, Bias, and Toxicity simultaneously. For more efficient, targeted evaluations, you can use specific evaluators like openlit.evals.Hallucination()
, openlit.evals.Bias()
, or openlit.evals.Toxicity()
.
For details on how it works, and to see the supported providers, models and parameters to pass, check our Evaluations Guide.
Collecting Evaluation metrics
The openlit.evals
module integrates with OpenTelemetry to track evaluation metrics as a counter, including score details and evaluation metadata. To enable metric collection, initialize OpenLIT with metrics tracking:
These metrics can be sent to any OpenTelemetry-compatible backend. For configuration details, check out our Connections Guide to choose your preferred destination for these metrics.
You’re all set! By following these steps, you can effectively evaluate the text generated by your models.
If you have any questions or need support, reach out to our community.
Was this page helpful?