Skip to main content

Persona & Behavior Simulation in Multi-Turn Dialogues

Okareo lets you simulate and evaluate full conversations - from straightforward question and answer flows to complex, agent-to-agent interactions. With Multi-Turn Simulations you can:

  • Verify behaviors like persona adherance and task completion across an entire dialog.
  • Stress-test your assistant with adversarial personals.
  • Call out to custom endpoints (such as your own service or RAG pipeline) and evaluate the real responses.
  • Track granular metrics and compare them over time.

Why Multi‑Turn Simulation?

Use Multi‑Turn when success depends on how the assistant behaves over time, not just what it says once.

Single‑Turn EvaluationMulti‑Turn Simulation
Spot‑checks isolated responses.Captures conversation dynamics: context, memory, tool calls, persona drift.
Limited resistance to prompt injections.Lets you inject adversarial or off‑happy‑path turns to probe robustness.
Limited visibility into session state or external calls.Can follow and score API calls, function‑calling, or custom‑endpoint responses throughout the dialog.

Core Concepts

Key Entities

TermWhat it is
Setting ProfileConfiguration blueprint that defines both the Target under test and the Driver Agent that will converse with it. Controls temperatures, stop logic, prompt template, repeats, and more.
TargetThe system under test—either a hosted model (e.g. gpt‑4o‑mini) or a mapping that tells Okareo how to call your service (Custom Endpoint / Custom Model).
Driver AgentA configurable simulation of a user persona defined in the Setting Profile. It sends messages to the Target according to its prompt template, temperature, and flow controls.
ScenarioA reusable collection of Scenario Rows. Each row provides runtime parameters (inserted into the Driver prompt) plus an expected result for checks to judge against.
Custom EndpointA REST mapping (URL, method, headers, body template, JSON paths) that lets Okareo call your running agent, LLM pipeline, RAG service, etc. during a simulation.
Custom ModelA class you implement by subclassing CustomModel in the Okareo SDK (Python or TypeScript). Provide an invoke() method and Okareo treats your proprietary code or on‑prem model as a first‑class, versioned model.
CheckA metric that scores the dialog (numeric or boolean). Built‑ins cover behavior adherence, model refusal, task completion, etc.; you can supply custom checks for your use case.

Driver vs. Scenario A Setting Profile defines one Driver Agent. A Scenario holds data rows that are injected into that Driver’s prompt. During a simulation, the Driver and Target alternate turns until a stop condition or maximum turns is reached.

Execution Objects

TermWhat it is
SimulationA single run that alternates Driver → Target turns using one Scenario and one Target Setting. It records the conversations between target and driver.
EvaluationThe scoring phase that executes all enabled Checks against the Simulation and produces metrics.

How It Works (High‑Level)

  1. Prepare artifacts
    • Setting Profile – selects a Target (Hosted Model or Custom Endpoint) and configures the Driver Agent (temperature, prompt template, stop logic, repeats, etc.).
    • Scenario – table of Scenario Rows (runtime parameters + expected results).
    • Checks – metrics that will score the Simulation (optional).
  2. Run a Simulation from the Multi‑Turn Simulation → Simulations tab.
  3. Okareo runs the simulation
    • If the Target is a Custom Endpoint, Okareo makes real HTTP calls using your mapping (URL, headers, body template).
    • Checks are calculated at each turn.
  4. When a stop criterion is met, the run ends and Checks are computed a final time.
  5. Review results: scores and the full dialog side‑by‑side.

Quick‑Start via the UI

1 · Define a Setting Profile (Settings sub‑tab)

  1. Click ➕ New Profile.

  2. Choose a Target:

    • Hosted Model – pick from Okareo’s catalog (e.g. gpt‑4o‑mini).
    • Custom Endpoint – map URL, headers, body template, and JSON paths.
  3. Configure the Driver Agent:

    • Driver Temperature, Repeats, Max Turns.
    • First Speaker – Driver or Target.
    • Driver Prompt Template – choose a template or write your own.
    • Stop When – select a check that terminates the dialog.

Tip: Unsure which JSONPath to use? Click Test Start Session to preview the raw response and adjust paths until the preview highlights the correct fields.

Target Settings

2 · Create a Scenario

  1. Switch to the Scenarios sub‑tab.

  2. Click + New Scenario and add one or more Scenario Rows:

    • Driver Parameters – values inserted into the prompt ({input}, etc.).
    • Expected Target Result – what success looks like for that row.

New Scenario

3 · Launch a Simulation

  1. Switch to the Simulations sub-tab.
  2. + New Simulation → select Scenario, Settings, and Checks → Create.
  3. Monitor progress in real time; each tile shows key metrics once completed.

Simulation

4 · Inspect Results

Click a Simulation tile to open its details. The results page breaks down the simulation into:

  • Conversation Transcript – View the full back-and-forth between the Driver and Target, one turn per row.
  • Checks – See results for:
    • Behavior Adherence – Did the assistant stay in character or follow instructions?
    • Model Refusal – Did the assistant properly decline off-topic or adversarial inputs?
    • Task Completed – Did it fulfill the main objective?
    • A custom check specific to your agent

Each turn is annotated with check results, so you can trace where things went wrong — or right.

Example: A Target correctly answered the task (“The capital of France is Paris”) but failed Model Refusal, as it should’ve declined the question based on the persona setup.

Results

Advanced Topics

Adversarial Simulations & Tool‑Call Testing

  • Add multiple rows to a Scenario that intentionally poke at edge cases (e.g. jailbreak attempts, bad‑actor personas).
  • Use the Custom Endpoint Target to exercise your entire agent pipeline, including RAG, calls to vector DBs, or function‑calling chains.
  • Combine with out-of-the-box Checks or custom checks you create.

SDK Helpers & Automation

  • Programmatically create Scenarios and Settings with the Okareo Python or TypeScript SDK.
  • Use the MultiTurnDriver class to craft sophisticated Driver behaviors (temperature, tool selection, stop policies, etc.). See the Python SDK reference or TypeScript SDK reference.

Prompt-Based vs. Custom-Endpoint Flow

Prompt-BasedCustom Endpoint
Where logic livesModel prompt onlyYour HTTP service
Ideal forRapid iteration, early prototypingComplex RAG or tool-calling pipelines