Tracing for Error Discovery
Large language models (LLMs) are powerful, but the non-determinsitic nature of results require more than passive moniroting. To establish reliability, accuracy, and safety in production environments, continuous error discovery is required. Okareo provides error discovery that enhances observability to identify specific behaviors, alert on errors, and highlight shifts in baseline performance. This guide will help you setup Error Discovery for your LLM application using Okareo.
Why Error Discovery?
LLMs can generate unexpected outputs, hallucinate information, and degrade in quality over time. Error Discovery helps you:
- Accelerate time to resolution through early detection and evaluation.
- Trap unexpected behaviors before they become widespread.
- Track response trends and performance over time.
- Improve user experience with data-driven insights.
Setting Up Error Discovery
Integrate Okareo into Your LLM Requests
To enable runtime evaluation, modify your existing LLM client configuration to route requests through Okareo's proxy. This allows Okareo to automatically evaluate each completion and conversation in your application. Okareo provides both a cloud hosted proxy and a self-hosted option. Learn more in the proxy sdk
- Python
- Typescript
- Curl
openai = OpenAI(
base_url="https://proxy.okareo.com",
default_headers={"api-key": "<OKAREO_API_KEY>"},
api_key="<YOUR_LLM_PROVIDER_KEY>")
const openai = new OpenAI({
baseURL: "https://proxy.okareo.com",
defaultHeaders: { "api-key": "<OKAREO_API_KEY>" },
apiKey: "<YOUR_LLM_PROVIDER_KEY>",
});
curl https://proxy.okareo.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR_LLM_PROVIDER_KEY>" \
-H "api-key: <OKAREO_API_KEY>" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "Answer the question with a single word based only on the following context: Capital of France is Berlin."
},
{
"role": "user",
"content": "Is Berlin the capital of France?"
}
]
}'
In these configurations, you set the baseURL
to Okareo's proxy endpoint and include your Okareo_API_KEY
in the headers. This setup enables Okareo to automatically classify and associate your data points with appropriate checks, utilizing built-in metrics or any custom metrics you've defined.
Step 3: Debugging Results


Okareo's Autonomous Evaluation inspects each LLM completion, applying numerous built-in metrics and any custom metrics you've specified. This continuous evaluation provides a stream of data you can use for real-time analytics and to simulate the impact of model and prompt updates.
At any time, you can collect a set of online completions to use as seeds for synthetic generation. With generated variations of real behaviors, you can run offline simulations to compare models and prompts or even establish regular end-to-end evaluations in continuous integration (CI) for deployment readiness.
Error Discovery with Okareo
Error Discovery is just the first step. Once issues are identified, effective evaluation is key to improving LLM performance. Okareo enables you to:
- Pinpoint problematic responses: Identify patterns in hallucinations, bias, or incorrect information.
- Aggregate failure cases: Detect common failure modes across multiple interactions.
- Surface underlying causes: Analyze prompts, response structures, and metadata to uncover root causes.
- Continuously improve: Use monitoring insights to refine model prompts, configurations, or even fine-tune models.
By integrating Okareo’s observability tools, you can quickly iterate on fixes and deploy improved LLM applications with confidence.
Next Steps
With Error Discovery in place, you can:
- Analyze Results: Use the Okareo dashboard to visualize performance metrics and detect patterns.
- Set Up Alerts: Configure alerts for anomalies or spikes in error rates.
- Optimize Prompts/Models: Iterate on your models based on the insights gained from evaluations.
- Establish Baselines: Run golden set scenarios on evaluations to establish baselines over time.