Skip to main content

Tracing for Error Discovery

Large language models (LLMs) are powerful, but the non-determinsitic nature of results require more than passive moniroting. To establish reliability, accuracy, and safety in production environments, continuous error discovery is required. Okareo provides error discovery that enhances observability to identify specific behaviors, alert on errors, and highlight shifts in baseline performance. This guide will help you setup Error Discovery for your LLM application using Okareo.

Why Error Discovery?

LLMs can generate unexpected outputs, hallucinate information, and degrade in quality over time. Error Discovery helps you:

  • Accelerate time to resolution through early detection and evaluation.
  • Trap unexpected behaviors before they become widespread.
  • Track response trends and performance over time.
  • Improve user experience with data-driven insights.

Setting Up Error Discovery

Integrate Okareo into Your LLM Requests

To enable runtime evaluation, modify your existing LLM client configuration to route requests through Okareo's proxy. This allows Okareo to automatically evaluate each completion and conversation in your application. Okareo provides both a cloud hosted proxy and a self-hosted option. Learn more in the proxy sdk

openai = OpenAI(
base_url="https://proxy.okareo.com",
default_headers={"api-key": "<OKAREO_API_KEY>"},
api_key="<YOUR_LLM_PROVIDER_KEY>")

In these configurations, you set the baseURL to Okareo's proxy endpoint and include your Okareo_API_KEY in the headers. This setup enables Okareo to automatically classify and associate your data points with appropriate checks, utilizing built-in metrics or any custom metrics you've defined.

Step 3: Debugging Results

Okareo Auto EvaluationOkareo Auto Evaluation

Okareo's Autonomous Evaluation inspects each LLM completion, applying numerous built-in metrics and any custom metrics you've specified. This continuous evaluation provides a stream of data you can use for real-time analytics and to simulate the impact of model and prompt updates.

At any time, you can collect a set of online completions to use as seeds for synthetic generation. With generated variations of real behaviors, you can run offline simulations to compare models and prompts or even establish regular end-to-end evaluations in continuous integration (CI) for deployment readiness.

Error Discovery with Okareo

Error Discovery is just the first step. Once issues are identified, effective evaluation is key to improving LLM performance. Okareo enables you to:

  • Pinpoint problematic responses: Identify patterns in hallucinations, bias, or incorrect information.
  • Aggregate failure cases: Detect common failure modes across multiple interactions.
  • Surface underlying causes: Analyze prompts, response structures, and metadata to uncover root causes.
  • Continuously improve: Use monitoring insights to refine model prompts, configurations, or even fine-tune models.

By integrating Okareo’s observability tools, you can quickly iterate on fixes and deploy improved LLM applications with confidence.

Next Steps

With Error Discovery in place, you can:

  • Analyze Results: Use the Okareo dashboard to visualize performance metrics and detect patterns.
  • Set Up Alerts: Configure alerts for anomalies or spikes in error rates.
  • Optimize Prompts/Models: Iterate on your models based on the insights gained from evaluations.
  • Establish Baselines: Run golden set scenarios on evaluations to establish baselines over time.