Custom-Endpoint Multi-Turn Simulations
Okareo can drive a full conversation against your running service (RAG pipeline, tool-calling agent, or any HTTP API) by mapping requests & JSON responses to a Custom Endpoint Target. This guide shows you, step-by-step, how to run a multi-turn simulation using custom endpoints, in either the Okareo UI or SDK.
You'll follow the same four core steps you saw in the Multi-Turn Overview.
Cookbook examples for this guide are available:
For more background on how simulations work in Okareo, see Simulation Overview
1 · Define a Setting Profile
A Setting Profile in Okareo is the configuration blueprint that controls how your multi-turn simulation operates.
It defines both sides of the conversation:
- Target: the system you are testing.
You can configure the Target in two ways:
- Custom Endpoint – any HTTP-accessible API you provide (for example, your RAG pipeline, tool-calling agent, or backend chat service).
- Foundation Model – a pre-integrated model (e.g., GPT-4o mini) selected from Okareo’s catalog of supported providers.
- Driver Agent: a configurable simulation of a user persona that is used to simulate a conversation for evaluation purposes.
You define the Driver by specifying:
- The prompt template or persona.
- Temperatures, stop conditions, and conversation flow behavior.
A Setting Profile specifies:
- How sessions are started (e.g., to create a conversation thread or context).
- How each turn is sent and how the responses are extracted.
- How sessions are ended or finalized.
- What persona or behavior the Driver Agent should emulate.
This separation lets you test any Target with different Driver configurations—changing temperature, prompts, or stopping logic without touching your endpoint code.
- Okareo UI
- Python
- TypeScript
-
Navigate to Multi-Turn Simulation → Settings and click ➕ New Profile.
-
Select Custom Endpoint as your Setting Profile type.
-
In the Driver section, configure:
- Stop When – the check that determines when the simulation should stop.
- Driver Temperature – how random or exploratory the Driver’s responses should be.
- Repeats – how many times to re-run the simulation.
- Max Turns – the maximum number of exchanges between the target and the driver.
- First Speaker – whether the Driver or Target starts the conversation.
- Driver Prompt Template – system prompt defining the Driver’s persona. You can write your own or select one of our templates
infoConfiguring LLMs to role-play as a user can be challenging. See our guide on Creating Drivers
-
Configure Target Parameters
You must define how Okareo communicates with your API across three phases of the conversation:
-
Start Session (Optional) Called once before the conversation begins. Use this to:
- create or fetch a session ID,
- set up initial context, or
- return a welcome message.
-
Next Turn (Required) Called every turn to send the Driver’s message and receive your system’s reply.
This is the core interaction loop. -
End Session (Optional) Called once after the simulation ends. Use this to:
- clean up resources,
- close the session, or
- log final metadata.
All three phases share the same set of parameters:
- Method – HTTP verb (e.g.,
POST
,GET
) - URL – the endpoint to call
- Headers – authentication or custom headers (e.g.,
api-key
) - Query Parameters – optional URL parameters
- Body Template – JSON body with template variables (e.g.,
{latest_message}
,{session_id}
) - Response Message Path – JSONPath to extract the assistant’s reply
- Response Session ID Path – JSONPath to extract the session/thread identifier
-
You can click Test Calls in the UI to make sure your configuration returns the expected fields before saving.
- Click Create to save the Setting Profile.
start_config = SessionConfig(
url="https://api.example.com/v1/session",
method="POST",
headers={"Authorization": "Bearer <TOKEN>"},
body={"userId": "<AGENT_ID>"},
response_session_id_path="response.id",
)
next_config = TurnConfig(
url="https://api.example.com/v1/session/messages",
method="POST",
headers={"Authorization": "Bearer <TOKEN>"},
body={"sessionId": "{session_id}", "messages": "{message_history}"},
response_message_path="response.messages.-1.content",
)
target = CustomEndpointTarget(start_config, next_config)
driver_model = MultiTurnDriver(
target=target,
driver_temperature=0.7,
max_turns=6,
stop_check=StopConfig(check_name="behavior_adherence"),
)
model = okareo.register_model(
name="Custom Endpoint Demo Model",
model=driver_model,
update=True,
)
Driver Parameters
Parameter | Description |
---|---|
driver_temperature | Controls randomness of user/agent simulation |
max_turns | Max back-and-forth messages |
repeats | Repeats each test row to capture variance |
first_turn | "driver" or "target" starts conversation |
stop_check | Defines stopping condition (via check) |
const startConfig: SessionConfig = {
url: "https://api.example.com/v1/session",
method: "POST",
headers: {
Authorization: "Bearer <TOKEN>",
"Content-Type": "application/json",
},
body: { userId: "<AGENT_ID>" },
response_session_id_path: "response.id",
};
const nextConfig: TurnConfig = {
url: "https://api.example.com/v1/session/messages",
method: "POST",
headers: {
Authorization: "Bearer <TOKEN>",
"Content-Type": "application/json",
},
body: { sessionId: "{session_id}", messages: "{message_history}" },
response_message_path: "response.messages.-1.content",
};
const target: CustomEndpointTarget = {
start_config: startConfig,
next_config: nextConfig,
};
const driverModel = new MultiTurnDriver({
target,
driver_temperature: 0.7,
max_turns: 6,
stop_check: { check_name: "behavior_adherence" },
});
const model = await okareo.register_model({
name: "Custom Endpoint Demo Model",
model: driverModel,
update: true,
});
Driver Parameters
Parameter | Description |
---|---|
driver_temperature | Controls randomness of user/agent simulation |
max_turns | Max back-and-forth messages |
repeats | Repeats each test row to capture variance |
first_turn | "driver" or "target" starts conversation |
stop_check | Defines stopping condition (via check) |
2 · Create a Scenario
A Scenario defines what should happen in each simulation run. Think of it as a test case matrix.
A Scenario is made up of one or more Scenario Rows.
Each row supplies runtime parameters that are inserted into the Driver Prompt, plus an Expected Target Result that Okareo’s checks (like Behavior Adherence) will judge against.
How simulation count works:
The total number of simulations = Number of Scenario Rows × Repeats (from the Setting Profile)
Examples:
- 1 Scenario Row × Repeats = 1 → 1 simulation
- 2 Scenario Rows × Repeats = 1 → 2 simulations
- 2 Scenario Rows × Repeats = 2 → 4 simulations (2 runs per row)
- Okareo UI
- Python
- TypeScript
- Switch to the Scenarios sub‑tab.
- Click + New Scenario and fill in:
- Driver Parameters – context that will be given to the driver for the simulation run.
- Expected Target Behavior – what success looks like for the target (“Explains policy & offers label”).
seeds = [
SeedData(
input_="Hi, I need to return a pair of shoes. What do I do?",
result="Agent explains return policy and offers a label.",
),
SeedData(
input_="Your site keeps crashing. Why?",
result="Agent apologises and asks for details.",
),
]
scenario = okareo.create_scenario_set(
ScenarioSetCreate(
name="Return Policy & Stability",
seed_data=seeds,
)
)
const seeds = [
{
input: "Hi, I need to return a pair of shoes. What do I do?",
result: "Agent explains return policy and offers a label.",
},
{
input: "Your site keeps crashing. Why?",
result: "Agent apologises and asks for details.",
},
];
const scenario = await okareo.create_scenario_set({
name: "Return Policy & Stability",
seed_data: seeds,
});
3 · Launch a Simulation
- Okareo UI
- Python
- TypeScript
- Switch to the Simulations sub-tab.
- Click + New Simulation → select Setting Profile, Scenario, and Checks.
- Click Create. You can watch the progress of the simulation.
run = model.run_test(
name="Endpoint Demo Run",
scenario=scenario,
test_run_type=TestRunType.MULTI_TURN,
checks=["behavior_adherence"],
)
print("View the run ➜", run.app_link)
const testRun = await model.run_test({
name: "Endpoint Demo Run",
scenario_id: scenario.scenario_id,
type: TestRunType.MULTI_TURN,
checks: ["behavior_adherence"],
});
console.log("View the run ➜", testRun.app_link);
4 · Inspect Results
Click a Simulation tile to open its details. The results page breaks down the simulation into:
- Conversation Transcript – View the full back-and-forth between the Driver and Target, one turn per row.
- Checks – See results for:
- Behavior Adherence – Did the assistant stay in character or follow instructions?
- Model Refusal – Did the assistant properly decline off-topic or adversarial inputs?
- Task Completed – Did it fulfill the main objective?
- A custom check specific to your agent
Each turn is annotated with check results, so you can trace where things went wrong — or right.
That's it! You now have a complete, repeatable workflow for evaluating agents with multi-turn simulations - entirely from the browser or your codebase.