Skip to main content

Persona & Behavior Simulation in Multi-Turn Dialogues

Okareo lets you simulate and evaluate full conversations - from straightforward question and answer flows to complex, agent-to-agent interactions. With Multi-Turn Simulations you can:

  • Verify behaviors like persona adherance and task completion across an entire dialog.
  • Stress-test your assistant with adversarial personals.
  • Call out to custom endpoints (such as your own service or RAG pipeline) and evaluate the real responses.
  • Track granular metrics and compare them over time.

Why Multi‑Turn Simulation?

Use Multi‑Turn when success depends on how the assistant behaves over time, not just what it says once.

Single‑Turn EvaluationMulti‑Turn Simulation
Spot‑checks isolated responses.Captures conversation dynamics: context, memory, tool calls, persona drift.
Limited resistance to prompt injections.Lets you inject adversarial or off‑happy‑path turns to probe robustness.
Limited visibility into session state or external calls.Can follow and score API calls, function‑calling, or custom‑endpoint responses throughout the dialog.

Core Concepts

Key Entities

TermWhat it is
TargetThe system under test – either a hosted model (e.g. gpt-4o-mini) or a configuration that tells Okareo how to call your service (Custom Endpoint / Custom Model).
DriverScripted speaker that interacts with the target. A Driver row defines a persona and optional expected behaviors that the Target should satisfy.
ScenarioA collection of Driver rows stored as a reusable asset. One Scenario can power many simulations.
Custom EndpointA REST mapping (URL, method, headers, body template, JSON paths) that lets Okareo call your running agent, LLM pipeline, RAG service, etc. during a simulation.
Custom ModelA class you implement by subclassing CustomModel in the Okareo SDK (Python or TypeScript). Provide an invoke() method and Okareo treats your proprietary code or on‑prem model as a first‑class, versioned model.
CheckA metric that scores the dialog (numeric or boolean). Built‑ins cover behavior adherence, model refusal, task completion, etc.; you can supply custom checks for your use case.

Execution Objects

TermWhat it is
SimulationA single run that alternates Driver → Target turns using one Scenario and one Target Setting. It records the conversations between target and driver.
EvaluationThe scoring phase that executes all enabled Checks against the Simulation and produces metrics.

Note: Driver ≠ Scenario A Scenario groups one or more Driver persona rows. Each simulation alternates turns between the Driver and the Target until a stop condition or maximum turns is reached.

How It Works (High‑Level)

  1. Prepare artifacts
    • Driver Scenario (CSV or table in the UI)
    • Target settings → either a Hosted Model or Custom Endpoint
    • Checks to score the Simulation (optional)
  2. Run a Simulation from the Multi‑Turn Simulation → Simulations tab.
  3. Okareo runs the simulation
    • If the Target is a Custom Endpoint, Okareo makes real HTTP calls using your mapping (URL, headers, body template).
    • Checks are calculated at each turn.
  4. When a stop criterion is met, the run ends and Checks are computed a final time.
  5. Review results: scores and the full dialog side‑by‑side.

Quick‑Start via the UI

1 · Define a Target agent profile (Settings sub‑tab)

  • Prompt – point to an existing hosted model (e.g. gpt-4o-mini)
  • Custom Endpoint – call your API. Provide:
    • URL & HTTP method
    • Headers / query params
    • Body template (supports {session_id}, {latest_message}, {message_history[i:j]} variables)
    • Response Session ID Path (e.g. response.thread_id)
    • Response Message Path (e.g. response.message)

Tip: Unsure which JSONPath to use? Click Test Start Session to preview the raw response and adjust paths until the preview highlights the correct fields.

Target Settings

2 · Choose or Define a Driver Persona

  1. Switch to the Scenarios sub‑tab.
  2. Click + New Scenario and fill in:
    • Driver Persona – natural‑language role description.
    • Expected Behaviors – what success looks like.

New Scenario

3 · Launch a Simulation

  1. Switch to the Simulations sub-tab.
  2. + New Simulation → select Scenario, Settings, and Checks → Run.
  3. Monitor progress in real time; each tile shows key metrics once completed.

Simulation

4 · Inspect Results

Click a Simulation tile to open its details. The results page breaks down the simulation into:

  • Conversation Transcript – View the full back-and-forth between the Driver and Target, one turn per row.
  • Checks – See results for:
    • Behavior Adherence – Did the assistant stay in character or follow instructions?
    • Model Refusal – Did the assistant properly decline off-topic or adversarial inputs?
    • Task Completed – Did it fulfill the main objective?
    • A custom check specific to your agent

Each turn is annotated with check results, so you can trace where things went wrong — or right.

Example: A Target correctly answered the task (“The capital of France is Paris”) but failed Model Refusal, as it should’ve declined the question based on the persona setup.

Results

Advanced Topics

Adversarial Simulations & Tool‑Call Testing

  • Add multiple rows to a Scenario that intentionally poke at edge cases (e.g. jailbreak attempts, bad‑actor personas).
  • Use the Custom Endpoint Target to exercise your entire agent pipeline, including RAG, calls to vector DBs, or function‑calling chains.
  • Combine with out-of-the-box Checks or custom checks you create.

SDK Helpers & Automation

  • Programmatically create Scenarios and Settings with the Okareo Python or TypeScript SDK.
  • Use the MultiTurnDriver class to craft sophisticated Driver behaviors (temperature, tool selection, stop policies, etc.). See the Python SDK reference or TypeScript SDK reference.

Prompt-Based vs. Custom-Endpoint Flow

Prompt-BasedCustom Endpoint
Where logic livesModel prompt onlyYour HTTP service
Ideal forRapid iteration, early prototypingComplex RAG or tool-calling pipelines