Skip to main content

Prompt-Based Multi-Turn Simulations

Okareo lets you drive an entire conversation with a single prompt per turn—no custom HTTP handlers required. This guide shows you, step-by-step, how to run a multi-turn simulation using prompts only, in either the Okareo UI or SDK.

You'll follow the same four core steps you saw in the Multi-Turn Overview, but every action is powered purely by prompts.

Cookbook examples for this guide are available:

1 · Define a Target agent profile

  1. Navigate to Multi-Turn SimulationsSettings.
  2. Click ➕ New Target and choose "Prompt".
  3. Fill in the model details—e.g. gpt-4o-mini—and give the target a clear name.
  4. Configure your target agent by setting a system prompt and temperature.

Define Target – UI placeholder

Driver Parameters

ParameterDescription
driver_temperatureControls randomness of user/agent simulation
max_turnsMax back-and-forth messages
repeatsRepeats each test row to capture variance
first_turn"driver" or "target" starts conversation
stop_checkDefines stopping condition (via check)

2 · Choose or Define a Driver Persona

  1. Switch to the Scenarios sub‑tab.
  2. Click + New Scenario and fill in:
    • Driver Persona – e.g. “Confused shopper asking about returns”.
    • Expected Behaviors – what success looks like (“Explains policy & offers label”).

New Scenario

3 · Launch a Simulation

Link the target and scenario together, choose run-time settings (temperature, max turns), and start the run.

  1. Switch to the Simulations sub-tab.
  2. New Simulation → select Scenario, Settings, and Checks → Run.
  3. Monitor progress in real time; each tile shows key metrics once completed.

Simulation

4 · Inspect Results

Click a Simulation tile to open its details. The results page breaks down the simulation into:

  • Conversation Transcript – View the full back-and-forth between the Driver and Target, one turn per row.
  • Checks – See results for:
    • Behavior Adherence – Did the assistant stay in character or follow instructions?
    • Model Refusal – Did the assistant properly decline off-topic or adversarial inputs?
    • Task Completed – Did it fulfill the main objective?
    • A custom check specific to your agent

Each turn is annotated with check results, so you can trace where things went wrong — or right.

Results

Next Steps

  • Tweak prompts and re-run to compare scores.
  • Add different Checks in the UI or in SDK calls to fit your use case.
  • Automate nightly runs in CI using the SDK.

Ready to move beyond prompts? See Custom Endpoint Multi-Turn to plug in your own API.


That's it! You now have a complete, repeatable workflow for evaluating assistants with prompt-based multi-turn simulations—entirely from the browser or your codebase.