Persona & Behavior Simulation in Multi-Turn Dialogues

Okareo lets you simulate and evaluate full conversations - from straightforward question and answer flows to complex, agent-to-agent interactions. With Multi-Turn Simulations you can:

Verify behaviors like persona adherance and task completion across an entire dialog.
Stress-test your assistant with adversarial personals.
Call out to custom endpoints (such as your own service or RAG pipeline) and evaluate the real responses.
Track granular metrics and compare them over time.

Why Multi‑Turn Simulation?

Use Multi‑Turn when success depends on how the assistant behaves over time, not just what it says once.

Single‑Turn Evaluation	Multi‑Turn Simulation
Spot‑checks isolated responses.	Captures conversation dynamics: context, memory, tool calls, persona drift.
Limited resistance to prompt injections.	Lets you inject adversarial or off‑happy‑path turns to probe robustness.
Limited visibility into session state or external calls.	Can follow and score API calls, function‑calling, or custom‑endpoint responses throughout the dialog.

Core Concepts

Key Entities

Term	What it is
Setting Profile	Configuration blueprint that defines both the Target under test and the Driver Agent that will converse with it. Controls temperatures, stop logic, prompt template, repeats, and more.
Target	The system under test—either a hosted model (e.g. `gpt‑4o‑mini`) or a mapping that tells Okareo how to call your service (Custom Endpoint / Custom Model).
Driver Agent	A configurable simulation of a user persona defined in the Setting Profile. It sends messages to the Target according to its prompt template, temperature, and flow controls.
Scenario	A reusable collection of Scenario Rows. Each row provides runtime parameters (inserted into the Driver prompt) plus an expected result for checks to judge against.
Custom Endpoint	A REST mapping (URL, method, headers, body template, JSON paths) that lets Okareo call your running agent, LLM pipeline, RAG service, etc. during a simulation.
Custom Model	A class you implement by subclassing `CustomModel` in the Okareo SDK (Python or TypeScript). Provide an `invoke()` method and Okareo treats your proprietary code or on‑prem model as a first‑class, versioned model.
Check	A metric that scores the dialog (numeric or boolean). Built‑ins cover behavior adherence, model refusal, task completion, etc.; you can supply custom checks for your use case.

Driver vs. Scenario A Setting Profile defines one Driver Agent. A Scenario holds data rows that are injected into that Driver’s prompt. During a simulation, the Driver and Target alternate turns until a stop condition or maximum turns is reached.

Execution Objects

Term	What it is
Simulation	A single run that alternates `Driver → Target` turns using one Scenario and one Target Setting. It records the conversations between target and driver.
Evaluation	The scoring phase that executes all enabled Checks against the Simulation and produces metrics.

How It Works (High‑Level)

Prepare artifacts
- Setting Profile – selects a Target (Hosted Model or Custom Endpoint) and configures the Driver Agent (temperature, prompt template, stop logic, repeats, etc.).
- Scenario – table of Scenario Rows (runtime parameters + expected results).
- Checks – metrics that will score the Simulation (optional).
Run a Simulation from the Multi‑Turn Simulation → Simulations tab.
Okareo runs the simulation
- If the Target is a Custom Endpoint, Okareo makes real HTTP calls using your mapping (URL, headers, body template).
- Checks are calculated at each turn.
When a stop criterion is met, the run ends and Checks are computed a final time.
Review results: scores and the full dialog side‑by‑side.

Quick‑Start via the UI

1 · Define a Setting Profile (Settings sub‑tab)

Click ➕ New Profile.
Choose a Target:
- Hosted Model – pick from Okareo’s catalog (e.g. gpt‑4o‑mini).
- Custom Endpoint – map URL, headers, body template, and JSON paths.
Configure the Driver Agent:
- Driver Temperature, Repeats, Max Turns.
- First Speaker – Driver or Target.
- Driver Prompt Template – choose a template or write your own.
- Stop When – select a check that terminates the dialog.

Tip: Unsure which JSONPath to use? Click Test Start Session to preview the raw response and adjust paths until the preview highlights the correct fields.

Target Settings

2 · Create a Scenario

Switch to the Scenarios sub‑tab.
Click + New Scenario and add one or more Scenario Rows:
- Driver Parameters – values inserted into the prompt ({input}, etc.).
- Expected Target Result – what success looks like for that row.

New Scenario

3 · Launch a Simulation

Switch to the Simulations sub-tab.
+ New Simulation → select Scenario, Settings, and Checks → Create.
Monitor progress in real time; each tile shows key metrics once completed.

Simulation

4 · Inspect Results

Click a Simulation tile to open its details. The results page breaks down the simulation into:

Conversation Transcript – View the full back-and-forth between the Driver and Target, one turn per row.
Checks – See results for:
- Behavior Adherence – Did the assistant stay in character or follow instructions?
- Model Refusal – Did the assistant properly decline off-topic or adversarial inputs?
- Task Completed – Did it fulfill the main objective?
- A custom check specific to your agent

Each turn is annotated with check results, so you can trace where things went wrong — or right.

Example: A Target correctly answered the task (“The capital of France is Paris”) but failed Model Refusal, as it should’ve declined the question based on the persona setup.

Results

Advanced Topics

Adversarial Simulations & Tool‑Call Testing

Add multiple rows to a Scenario that intentionally poke at edge cases (e.g. jailbreak attempts, bad‑actor personas).
Use the Custom Endpoint Target to exercise your entire agent pipeline, including RAG, calls to vector DBs, or function‑calling chains.
Combine with out-of-the-box Checks or custom checks you create.

SDK Helpers & Automation

Programmatically create Scenarios and Settings with the Okareo Python or TypeScript SDK.
Use the MultiTurnDriver class to craft sophisticated Driver behaviors (temperature, tool selection, stop policies, etc.). See the Python SDK reference or TypeScript SDK reference.

Prompt-Based vs. Custom-Endpoint Flow

	Prompt-Based	Custom Endpoint
Where logic lives	Model prompt only	Your HTTP service
Ideal for	Rapid iteration, early prototyping	Complex RAG or tool-calling pipelines

Why Multi‑Turn Simulation?​

Core Concepts​

Key Entities​

Execution Objects​

How It Works (High‑Level)​

Quick‑Start via the UI​

1 · Define a Setting Profile (Settings sub‑tab)​

2 · Create a Scenario​

3 · Launch a Simulation​

4 · Inspect Results​

Advanced Topics​

Adversarial Simulations & Tool‑Call Testing​

SDK Helpers & Automation​

Prompt-Based vs. Custom-Endpoint Flow​