Skip to main content

Prompt-Target Multi-Turn Simulations

Okareo lets you simulate full back-and-forth conversations using only prompts—no need to implement any custom API endpoints or HTTP handlers. This guide walks you through how to run a prompt-only multi-turn simulation, step-by-step, using either the Okareo UI or SDK.

You'll follow the same four core steps you saw in the Multi-Turn Overview, but every action is powered purely by prompts.

Cookbook examples for this guide are available:

tip

For more background on how simulations work in Okareo, see Simulation Overview

1 · Configure a Target

A Target is the model under test (MUT) — the foundation model and its system prompt you want to evaluate.

  1. Navigate to Targets and click ➕ Create Target.

    Targets Zero State

  2. Choose Prompt select a model and add a prompt:

    Target Form

  3. Click Create. Your Target is now available to reuse in any simulation.

2 · Register a Driver

A Driver is the simulated user persona that talks to your Target.

info

Configuring LLMs to role-play as a user can be challenging. See our guide on Creating Drivers

  1. Go to Simulations → Drivers and click ➕ New Driver.

  2. Fill in:

    • Name – a descriptive label (e.g., “Busy User”).
    • Temperature – variability of the driver’s behavior (0 = deterministic).
    • Prompt Template – the persona & rules. You can start from a template and edit it, or paste your own.
      Use {scenario_input.*} to reference fields from your Scenario rows.

    Driver Form

  3. Click Create. Your Driver is now available to reuse in any simulation.

3 · Create a Scenario

A Scenario defines what should happen in each simulation run. Think of it as a test case matrix.

A Scenario is made up of one or more Scenario Rows. Each row supplies runtime parameters that are inserted into the Driver Prompt, plus an Expected Target Result that Okareo’s checks (like Behavior Adherence) will judge against.

How simulation count works:

The total number of simulations = Number of Scenario Rows × Repeats (from the Setting Profile)

Examples:

  • 1 Scenario Row × Repeats = 1 → 1 simulation
  • 2 Scenario Rows × Repeats = 1 → 2 simulations
  • 2 Scenario Rows × Repeats = 2 → 4 simulations (2 runs per row)
  1. Go to Studio → Synthetic Scenario Copilot.

  2. Add rows:

    • Input (JSON): any fields your driver prompt references, e.g. { "name": "Paul", "objective": "Reset your debit PIN" }
    • Expected Result (text): the success criteria (e.g., “User completes debit PIN reset and confirms it’s done.”)
  3. To generate rows with AI, describe them in the text box at the bottom (“Describe the desired properties…”), then refine as needed.

  4. Save the scenario set: hover the toolbar icon in the lower-right inside the dialog, then click the save icon to name and save.

    Scenario Copilot

Your scenario set is now available to reuse across simulations.

4 · Launch a Simulation

  1. Navigate to Simulations and click ➕ Create Multi-Turn Simulation.

Simulations

  1. Select a Target, Driver, Scenario, and Checks.

Simulations Form

  1. Click Create. You can watch the progress of the simulation.

Simulations Index

5 · Inspect Results

Click a Simulation tile to open its details. The results page breaks down the simulation into:

  • Conversation Transcript – View the full back-and-forth between the Driver and Target, one turn per row.
  • Checks – See results for:
    • Behavior Adherence – Did the assistant stay in character or follow instructions?
    • Model Refusal – Did the assistant properly decline off-topic or adversarial inputs?
    • Task Completed – Did it fulfill the main objective?
    • A custom check specific to your agent

Each turn is annotated with check results, so you can trace where things went wrong — or right.

Results

Next Steps

  • Tweak prompts and re-run to compare scores.
  • Add different Checks in the UI or in SDK calls to fit your use case.
  • Automate nightly runs in CI using the SDK.

Ready to move beyond prompts? See Custom Endpoint Multi-Turn to plug in your own API.


That's it! You now have a complete, repeatable workflow for evaluating assistants with prompt-based multi-turn simulations—entirely from the browser or your codebase.