All
evaluator for comprehensive LLM output assessment, measuring hallucination, bias, and toxicity simultaneously. We’ll also show you how to collect OpenTelemetry evaluation metrics for continuous model performance monitoring.
1
Initialize evaluations
Set up evaluations for large language models with just two lines of code:Full Example:The For advanced LLM evaluation metrics and supported providers, explore our Evaluations Guide.
example.py
Output
All
evaluator assesses model outputs for hallucination detection, bias detection, and toxicity filtering simultaneously. For targeted model evaluation, use specific evaluators:Hallucination detection
Detect factual inaccuracies and false information in LLM responses
Bias detection
Identify potential biases and unfair representations in AI outputs
Toxicity filtering
Screen content for harmful, offensive, or inappropriate material
2
Track LLM evaluation metrics
To send evaluation scores to OpenTelemetry backends, your application needs to be instrumented via OpenLIT. Choose from three instrumentation methods, then simply add Metrics are sent to the same OpenTelemetry backend conifgured during instrumentation, check our support destinations for configuration details.
collect_metrics=True
to track hallucination detection, bias screening, and toxicity filtering metrics.No code changes needed - instrument via CLI:Then in your application:
LLM-as-a-judge
Automatically add evaluation scoring to production traces
Integrations
60+ AI integrations with automatic instrumentation and performance tracking
Destinations
Send elemetry to Datadog, Grafana, New Relic, and other observability stacks
Running in Kubernetes? Try the OpenLIT Operator
Automatically inject instrumentation into existing workloads without modifying pod specs, container images, or application code.