Persona & Behavior Simulation in Multi-Turn Dialogues
Okareo lets you simulate and evaluate full conversations - from straightforward question and answer flows to complex, agent-to-agent interactions. With Multi-Turn Simulations you can:
- Verify behaviors like persona adherance and task completion across an entire dialog.
- Stress-test your assistant with adversarial personals.
- Call out to custom endpoints (such as your own service or RAG pipeline) and evaluate the real responses.
- Track granular metrics and compare them over time.
Why Multi‑Turn Simulation?
Use Multi‑Turn when success depends on how the assistant behaves over time, not just what it says once.
Single‑Turn Evaluation | Multi‑Turn Simulation |
---|---|
Spot‑checks isolated responses. | Captures conversation dynamics: context, memory, tool calls, persona drift. |
Limited resistance to prompt injections. | Lets you inject adversarial or off‑happy‑path turns to probe robustness. |
Limited visibility into session state or external calls. | Can follow and score API calls, function‑calling, or custom‑endpoint responses throughout the dialog. |
Core Concepts
Key Entities
Term | What it is |
---|---|
Target | The system under test – either a hosted model (e.g. gpt-4o-mini ) or a configuration that tells Okareo how to call your service (Custom Endpoint / Custom Model). |
Driver | Scripted speaker that interacts with the target. A Driver row defines a persona and optional expected behaviors that the Target should satisfy. |
Scenario | A collection of Driver rows stored as a reusable asset. One Scenario can power many simulations. |
Custom Endpoint | A REST mapping (URL, method, headers, body template, JSON paths) that lets Okareo call your running agent, LLM pipeline, RAG service, etc. during a simulation. |
Custom Model | A class you implement by subclassing CustomModel in the Okareo SDK (Python or TypeScript). Provide an invoke() method and Okareo treats your proprietary code or on‑prem model as a first‑class, versioned model. |
Check | A metric that scores the dialog (numeric or boolean). Built‑ins cover behavior adherence, model refusal, task completion, etc.; you can supply custom checks for your use case. |
Execution Objects
Term | What it is |
---|---|
Simulation | A single run that alternates Driver → Target turns using one Scenario and one Target Setting. It records the conversations between target and driver. |
Evaluation | The scoring phase that executes all enabled Checks against the Simulation and produces metrics. |
Note: Driver ≠ Scenario A Scenario groups one or more Driver persona rows. Each simulation alternates turns between the Driver and the Target until a stop condition or maximum turns is reached.
How It Works (High‑Level)
- Prepare artifacts
- Driver Scenario (CSV or table in the UI)
- Target settings → either a Hosted Model or Custom Endpoint
- Checks to score the Simulation (optional)
- Run a Simulation from the Multi‑Turn Simulation → Simulations tab.
- Okareo runs the simulation
- If the Target is a Custom Endpoint, Okareo makes real HTTP calls using your mapping (URL, headers, body template).
- Checks are calculated at each turn.
- When a stop criterion is met, the run ends and Checks are computed a final time.
- Review results: scores and the full dialog side‑by‑side.
Quick‑Start via the UI
1 · Define a Target agent profile (Settings sub‑tab)
- Prompt – point to an existing hosted model (e.g.
gpt-4o-mini
) - Custom Endpoint – call your API. Provide:
- URL & HTTP method
- Headers / query params
- Body template (supports
{session_id}
,{latest_message}
,{message_history[i:j]}
variables) - Response Session ID Path (e.g.
response.thread_id
) - Response Message Path (e.g.
response.message
)
Tip: Unsure which JSONPath to use? Click Test Start Session to preview the raw response and adjust paths until the preview highlights the correct fields.
2 · Choose or Define a Driver Persona
- Switch to the Scenarios sub‑tab.
- Click + New Scenario and fill in:
- Driver Persona – natural‑language role description.
- Expected Behaviors – what success looks like.
3 · Launch a Simulation
- Switch to the Simulations sub-tab.
- + New Simulation → select Scenario, Settings, and Checks → Run.
- Monitor progress in real time; each tile shows key metrics once completed.
4 · Inspect Results
Click a Simulation tile to open its details. The results page breaks down the simulation into:
- Conversation Transcript – View the full back-and-forth between the Driver and Target, one turn per row.
- Checks – See results for:
- Behavior Adherence – Did the assistant stay in character or follow instructions?
- Model Refusal – Did the assistant properly decline off-topic or adversarial inputs?
- Task Completed – Did it fulfill the main objective?
- A custom check specific to your agent
Each turn is annotated with check results, so you can trace where things went wrong — or right.
Example: A Target correctly answered the task (“The capital of France is Paris”) but failed Model Refusal, as it should’ve declined the question based on the persona setup.
Advanced Topics
Adversarial Simulations & Tool‑Call Testing
- Add multiple rows to a Scenario that intentionally poke at edge cases (e.g. jailbreak attempts, bad‑actor personas).
- Use the Custom Endpoint Target to exercise your entire agent pipeline, including RAG, calls to vector DBs, or function‑calling chains.
- Combine with out-of-the-box Checks or custom checks you create.
SDK Helpers & Automation
- Programmatically create Scenarios and Settings with the Okareo Python or TypeScript SDK.
- Use the
MultiTurnDriver
class to craft sophisticated Driver behaviors (temperature, tool selection, stop policies, etc.). See the Python SDK reference or TypeScript SDK reference.
Prompt-Based vs. Custom-Endpoint Flow
Prompt-Based | Custom Endpoint | |
---|---|---|
Where logic lives | Model prompt only | Your HTTP service |
Ideal for | Rapid iteration, early prototyping | Complex RAG or tool-calling pipelines |