Skip to main content

okareo

BaseGenerationSchema Objects

class BaseGenerationSchema(PydanticBaseModel)

A base schema class for specifying structured outputs to synthetic data generators.

Okareo Objects

class Okareo()

A class for interacting with Okareo API and for formatting request data.

seed_data_from_list

@staticmethod
def seed_data_from_list(data_list: List[SeedDataRow]) -> List[SeedData]

Create a list of SeedData objects from a list of dictionaries.

Each dictionary in the input list must have 'input' and 'result' keys.

Arguments:

  • data_list List[SeedDataRow] - A list of dictionaries, where each dictionary contains 'input' and 'result' keys.

Returns:

  • List[SeedData] - A list of SeedData objects created from the input dictionaries.

get_projects

def get_projects() -> List[ProjectResponse]

Get a list of all Okareo projects available to the user.

Returns:

  • List[ProjectResponse] - A list of ProjectResponse objects accessible to the user.

Raises:

  • TypeError - If the API response is an error.
  • ValueError - If no response is received from the API.

create_project

def create_project(name: str,
tags: Union[Unset, List[str]] = UNSET) -> ProjectResponse

Create a new Okareo project.

Arguments:

  • name str - The name of the new project.
  • tags Union[Unset, List[str]], optional - Optional list of tags to associate with the project.

Returns:

  • ProjectResponse - The created ProjectResponse object.

Raises:

  • TypeError - If the API response is an error.
  • ValueError - If no response is received from the API.

register_model

def register_model(
name: str,
tags: Union[List[str], None] = None,
project_id: Union[str, UUID, None] = None,
model: Union[None, BaseModel, List[BaseModel]] = None,
update: bool = False,
sensitive_fields: Union[List[str], None] = None) -> ModelUnderTest

Register a new Model Under Test (MUT) to use in an Okareo evaluation.

Arguments:

  • name str - The name of the model. Model names must be unique within a project. Using the same name will return or update the existing model.
  • tags Union[List[str], None], optional - Optional list of tags to associate with the model.
  • project_id Union[str, None], optional - The project ID to associate the model with.
  • model Union[None, BaseModel, List[BaseModel]], optional - The model or list of models to register.
  • update bool, optional - Whether to update an existing model with the same name. Defaults to False.
  • sensitive_fields List[str], optional - A list of sensitive fields to mask in the model parameters. Defaults to None.

Returns:

  • ModelUnderTest - The registered ModelUnderTest object.

Raises:

  • TypeError - If the API response is an error.
  • ValueError - If no response is received from the API.

get_model

def get_model(name: str, version: str | int = "latest") -> ModelUnderTest

Fetch a model under test based on the name and version.

Arguments:

  • name str - The name of the model to fetch.
  • version str | int, optional - The version of the model to fetch. Defaults to "latest".

create_scenario_set

def create_scenario_set(
create_request: ScenarioSetCreate) -> ScenarioSetResponse

Create a new scenario set to use in an Okareo evaluation or as a seed for synthetic data generation.

Arguments:

  • create_request ScenarioSetCreate - The request object containing scenario set details and seed data. The ScenarioSetCreate object should include:

Returns:

  • ScenarioSetResponse - The created ScenarioSetResponse object.

Raises:

  • ValueError - If the seed data is empty or if no response is received from the API.
  • TypeError - If the API response is an error.

Example:

seed_data = okareo_client.seed_data_from_list([
{"input": {"animal": "fish", "color": "red"}, "result": "red"},
{"input": {"animal": "dog", "color": "blue"}, "result": "blue"},
{"input": {"animal": "cat", "color": "green"}, "result": "green"}
])
create_request = ScenarioSetCreate(name="My Scenario Set", seed_data=seed_data)
okareo_client.create_scenario_set(create_request)

upload_scenario_set

def upload_scenario_set(
scenario_name: str,
file_path: str,
project_id: Union[Unset, str, UUID] = UNSET) -> ScenarioSetResponse

Upload a file as a scenario set to use in an Okareo evaluation or as a seed for synthetic data generation.

Arguments:

  • scenario_name str - The name to assign to the uploaded scenario set.
  • file_path str - The path to the file to upload.
  • project_id Union[Unset, str], optional - The project ID to associate with the scenario set.

Returns:

  • ScenarioSetResponse - The created ScenarioSetResponse object.

Raises:

  • UnexpectedStatus - If the API returns an unexpected status.
  • TypeError - If the API response is an error.
  • ValueError - If no response is received from the API.

Example:

project_id = "your_project_id"  # Optional, can be None
okareo_client.upload_scenario_set(
scenario_name="My Uploaded Scenario Set",
file_path="/path/to/scenario_set_file.json",
project_id=project_id or None,
)

download_scenario_set

def download_scenario_set(scenario: Union[ScenarioSetResponse, str],
file_path: str = "") -> Any

Download a scenario set from Okareo to the client's local filesystem.

Arguments:

  • scenario_set ScenarioSetResponse - The scenario set to download.
  • file_path str, optional - The path where the file will be saved. If not provided, uses scenario set name.

Returns:

  • File - The downloaded file object.

Example:

response_file = okareo_client.download_scenario_set(create_scenario_set)
with open(response_file.name) as scenario_file:
for line in scenario_file:
print(line)

generate_scenarios

def generate_scenarios(
source_scenario: Union[str, UUID, ScenarioSetResponse],
name: str,
number_examples: int,
project_id: Union[Unset, str, UUID] = UNSET,
generation_type: Union[Unset,
ScenarioType] = ScenarioType.REPHRASE_INVARIANT
) -> ScenarioSetResponse

Generate a synthetic scenario set based on an existing seed scenario.

Arguments:

  • source_scenario Union[str, ScenarioSetResponse] - The source scenario set or its ID to generate from.
  • name str - The name for the new generated scenario set.
  • number_examples int - The number of synthetic examples to generate per seed scenario row.
  • project_id Union[Unset, str], optional - The project ID to associate with the generated scenario set.
  • generation_type Union[Unset, ScenarioType], optional - The type of scenario generation to use.

Returns:

  • ScenarioSetResponse - The generated synthetic scenario set.

Raises:

  • TypeError - If the API response is an error.
  • ValueError - If no response is received from the API.

Example:

source_scenario = "source_scenario_id"  # or ScenarioSetResponse object
generated_set = okareo_client.generate_scenarios(
source_scenario=source_scenario,
name="Generated Scenario Set",
number_examples=100,
project_id="your_project_id",
generation_type=ScenarioType.REPHRASE_INVARIANT
)
print(generated_set.app_link) # Prints the link to the generated scenario set

generate_scenario_set

def generate_scenario_set(
create_request: ScenarioSetGenerate) -> ScenarioSetResponse

Generate a synthetic scenario set based on an existing seed scenario and a ScenarioSetGenerate object. Offers more controls than the comparable generate_scenarios method.

Arguments:

  • create_request ScenarioSetGenerate - The request object specifying scenario generation parameters.

Returns:

  • ScenarioSetResponse - The generated synthetic scenario set.

Example:

generate_request = ScenarioSetGenerate(
source_scenario_id="seed_scenario_id",
name="My Synthetic Scenario Set",
number_examples=50,
project_id="your_project_id",
generation_type=ScenarioType.REPHRASE_INVARIANT,
)
generated_set = okareo_client.generate_scenario_set(generate_request)
print(generated_set.app_link) # Prints the link to the generated scenario set

get_scenario_data_points

def get_scenario_data_points(
scenario_id: Union[str, UUID]) -> List[ScenarioDataPoinResponse]

Fetch the scenario data points associated with a scenario set with scenario_id.

Arguments:

  • scenario_id str - The ID of the scenario set to fetch data points for.

Returns:

  • List[ScenarioDataPoinResponse] - A list of scenario data point responses associated with the scenario set.

Example:

okareo_client = Okareo(api_key="your_api_key")
scenario_id = "your_scenario_id"
data_points = okareo_client.get_scenario_data_points(scenario_id)
for dp in data_points:
print(dp.input_, dp.result)

find_test_data_points

def find_test_data_points(
test_data_point_payload: FindTestDataPointPayload
) -> List[Union[TestDataPointItem, FullDataPointItem]]

Fetch the test run data points associated as specified in the payload.

Arguments:

  • test_data_point_payload FindTestDataPointPayload - The payload specifying the test data point search criteria.

Returns:

List[Union[TestDataPointItem, FullDataPointItem]]: A list of test or full data point items.

Raises:

  • TypeError - If the API response is an error.

Example:

from okareo_api_client.models.find_test_data_point_payload import (
FindTestDataPointPayload,
)

test_run_id = "your_test_run_id" # Replace with your actual test run ID
payload = FindTestDataPointPayload(
test_run_id=test_run_id,
)
data_points = okareo_client.find_test_data_points(payload)
for dp in data_points:
print(dp)

find_datapoints

def find_datapoints(
datapoint_search: DatapointSearch) -> List[DatapointListItem]

Fetch the datapoints specified by a Datapoint Search.

Arguments:

  • datapoint_search DatapointSearch - The search criteria for fetching datapoints.

Returns:

  • List[DatapointListItem] - A list of datapoint items matching the search.

Raises:

  • TypeError - If the API response is an error.

Example:

from okareo_api_client.models.datapoint_search import DatapointSearch

### Search based on a test run ID
test_run__id = "your_test_run_id" # Replace with your actual test run ID
search = DatapointSearch(
test_run_id=test_run__id,
)
datapoints = okareo_client.find_datapoints(search)
for dp in datapoints:
print(dp)

### Search based on a context token from a logger
logger_config = {
"api_key": "<API_KEY>",
"tags": ["logger-test"],
"context_token": random_string(10),
}
# Use the logger config to log completions from CrewAI or Autogen
...

# Search for the logged datapoints by the context token
search = DatapointSearch(
context_token=context_token,
)
datapoints = okareo_client.find_datapoints(search)
for dp in datapoints:
print(dp)

find_datapoints_filter

def find_datapoints_filter(
datapoint_search: DatapointFilterSearchPayload
) -> List[DatapointListItem]

Fetch the datapoints specified by a Datapoint Search.

Arguments:

  • datapoint_search DatapointFilterSearchPayload - The search criteria for fetching datapoints.

Returns:

  • List[DatapointListItem] - A list of datapoint items matching the search.

Raises:

  • TypeError - If the API response is an error.

Example:

from okareo_api_client.models.datapoint_filter_search_payload import DatapointFilterSearchPayload

### Search based on a test run ID
test_run__id = "your_test_run_id" # Replace with your actual test run ID
search = DatapointFilterSearchPayload(
test_run_id=test_run__id,
)
datapoints = okareo_client.find_datapoints(search)
for dp in datapoints:
print(dp)

### Find datapoints based on filters on datapoints fields
from okareo_api_client.models.datapoint_filter_search_payload import DatapointFilterSearchPayload
from okareo_api_client.models.filter_condition import FilterCondition
from okareo_api_client.models.comparison_operator import ComparisonOperator

search = DatapointFilterSearchPayload(
filters=[FilterCondition(
field=DatapointField.TEST_RUN_ID,
operator=ComparisonOperator.EQUAL,
value="France"
)]
)
datapoints = okareo_client.find_datapoints_filter(search)
for dp in datapoints:
print(dp)

generate_check

def generate_check(
create_check: EvaluatorSpecRequest) -> EvaluatorGenerateResponse

Generate the contents of a Check based on an EvaluatorSpecRequest. Can be used to generate a behavioral (model-based) or a deterministic (code-based) check. Check names must be unique within a project.

Arguments:

  • create_check EvaluatorSpecRequest - The specification for the check to generate.

Returns:

  • EvaluatorGenerateResponse - The generated check response.

Example:

from okareo_api_client.models.evaluator_spec_request import EvaluatorSpecRequest
from okareo.okareo import OkareoClient, BaseCheck

# Generate a behavioral model-based check
spec = EvaluatorSpecRequest(
description="Checks if the output contains toxic language.",
requires_scenario_input=False,
requires_scenario_result=False,
output_data_type="bool", # bool, int, float
)
okareo_client = Okareo(api_key="your_api_key")
generated_check = okareo_client.generate_check(spec)

# Inspect the generated check to ensure it meets your requirements
print(generated_check)

# Upload the generated check to Okareo to use in evaluations
toxicity_check = okareo.create_or_update_check(
name="toxicity_check",
description=generated_check.description,
check=ModelBasedCheck( # type: ignore
prompt_template=check.generated_prompt,
check_type=CheckOutputType.PASS_FAIL,
),
)
# Inspect the uploaded check
print(toxicity_check)

get_all_checks

def get_all_checks(all_versions: bool = False) -> List[EvaluatorBriefResponse]

Fetch all available checks.

Arguments:

  • all_versions - If True, return all versions of every check (full version history). Defaults to False (latest version only).

Returns:

  • List[EvaluatorBriefResponse] - A list of EvaluatorBriefResponse objects representing all available checks.

Example:

checks = okareo_client.get_all_checks()
for check in checks:
print(check.name, check.id)

# Include full version history
all_checks = okareo_client.get_all_checks(all_versions=True)

get_check

def get_check(
check_id: Union[str, UUID],
version: Union[str, int, None] = None) -> EvaluatorDetailedResponse

Fetch details for a specific check by UUID or by name.

Arguments:

  • check_id - A check UUID (str or UUID object) or a check name (str). When a name is given the method resolves it to a UUID via the list endpoint.
  • version - Optional version number or the string &quot;latest&quot;. Only used when check_id is a name. None and &quot;latest&quot; both resolve to the most recent version.

Returns:

  • EvaluatorDetailedResponse - The detailed response for the specified check.

Raises:

  • ValueError - If no check matches the given name/version.

Example:

# By UUID (existing behaviour)
check = okareo_client.get_check(&quot;your_check_uuid&quot;)

# By name (latest version)
check = okareo_client.get_check(&quot;my_check&quot;)
check = okareo_client.get_check(&quot;my_check&quot;, version=&quot;latest&quot;)

# By name + pinned version
check = okareo_client.get_check(&quot;my_check&quot;, version=1)

delete_check

def delete_check(check_id: Union[str, UUID], check_name: str) -> str

Deletes a check identified by its ID and name.

Arguments:

  • check_id str - The unique identifier of the check to delete.
  • check_name str - The name of the check to delete.

Returns:

  • str - A message indicating the result of the deletion.

Example:

result = okareo_client.delete_check(check_id="abc123", check_name="MyCheck")
print(result) # Output: Check deletion was successful

create_or_update_check

def create_or_update_check(
name: str,
description: str,
check: BaseCheck,
tags: Optional[List[str]] = None) -> EvaluatorDetailedResponse

Create or update an existing check. If the check with 'name' already exists, then this method will update the existing check. Otherwise, this method will create a new check.

Arguments:

  • name str - The unique name of the check to create or update.
  • description str - A human-readable description of the check.
  • check BaseCheck - An instance of BaseCheck containing the check configuration.
  • tags - Optional list of string tags to associate with the check.

Returns:

  • EvaluatorDetailedResponse - The detailed response from the evaluator after creating or updating the check.

Raises:

  • AssertionError - If the response is not an instance of EvaluatorDetailedResponse.
  • ValueError - If the response validation fails.

Example:

from okareo.checks import CheckOutputType, ModelBasedCheck

my_check = ModelBasedCheck(
prompt_template="Only output the number of words in the following text: {scenario_input} {generation}",
check_type=CheckOutputType.PASS_FAIL,
)

response = okareo_client.create_or_update_check(
name="my_word_count_check",
description="Custom check for counting combined total number of words in input and output.",
check=my_check,
tags=["prod", "v1"],
)

print(response)

create_trace_eval

def create_trace_eval(group: Any, context_token: str) -> Any

Create a trace evaluation for a group.

Arguments:

  • group_id str - The ID of the group.
  • context_token str - The context token for the trace.

Returns:

The created trace evaluation details.

Raises:

  • OkareoAPIException - If the API request fails.

evaluate

def evaluate(name: str,
test_run_type: TestRunType,
scenario_id: Union[Unset, str] = UNSET,
datapoint_ids: Union[Unset, list[str]] = UNSET,
filter_group_id: Union[Unset, str] = UNSET,
tags: Union[Unset, list[str]] = UNSET,
metrics_kwargs: Union[Dict[str, Any], Unset] = UNSET,
checks: Union[Unset, list[str]] = UNSET) -> TestRunItem

Evaluate datapoints using the specified parameters.

Arguments:

  • scenario_id - ID of the scenario set
  • metrics_kwargs - Dictionary of metrics to be measured
  • name - Name of the test run
  • test_run_type - Type of test run
  • tags - Tags for filtering test runs
  • checks - List of checks to include
  • datapoint_ids - List of datapoint IDs to filter by
  • filter_group_id - ID of the datapoint filter group to apply

Returns:

  • TestRunItem - The evaluation results as a TestRunItem object.

Example:

checks = ["model_refusal"]  # one or more checks to apply in the evaluation
test_run = okareo.evaluate(
name="My Test Run",
test_run_type=TestRunType.NL_GENERATION,
checks=checks,
datapoint_ids=["datapoint_id_1", "datapoint_id_2"],
)
print(test_run.app_link) # View link to eval results in Okareo app

create_or_update_driver

def create_or_update_driver(driver: Driver) -> Driver

Create or update a driver.

Arguments:

  • driver - The driver to create or update.

Returns:

The created or updated driver.

get_driver_by_name

def get_driver_by_name(driver_name: str) -> Driver

Get a driver by its name.

Arguments:

  • driver_name - The name of the driver to retrieve.

Returns:

The driver with the specified name.

create_or_update_target

def create_or_update_target(
target: Target,
tags: Optional[List[str]] = None,
project_id: Optional[str] = None,
sensitive_fields: Union[List[str], None] = None) -> Target

Create or update a target.

Arguments:

  • target - The target to create or update.

Returns:

The created or updated target.

generate_driver_prompt

def generate_driver_prompt(user_input: str,
prior_prompt: Optional[str] = None,
language: Optional[str] = None,
**driver_kwargs: Any) -> Driver

Generate a structured driver prompt from a one-sentence description.

Arguments:

  • user_input - Natural language description of the caller persona.
  • prior_prompt - Optional existing prompt to refine.
  • language - BCP-47 language code (e.g. "en", "es", "fr-CA").
  • **driver_kwargs - Extra fields forwarded to the returned Driver (e.g. voice_instructions, temperature, voice).

Returns:

A Driver with the AI-generated name and prompt_template.

find_test_runs

def find_test_runs(name: Optional[str] = None,
tags: Optional[list] = None,
project_id: Optional[str] = None,
return_model_metrics: bool = False) -> list

Find test runs, optionally filtering by name or tags.

Arguments:

  • name - Filter results to runs with this exact name (client-side).
  • tags - Filter results to runs with these tags (server-side).
  • project_id - Scope to a specific project.
  • return_model_metrics - Include model_metrics in the response.

Returns:

List of test run dicts from the server.

re_evaluate

def re_evaluate(test_run_id: str,
checks: list,
name: Optional[str] = None,
tags: Optional[list] = None) -> TestRunItem

Re-evaluate an existing test run with different checks.

No new simulation or phone call is made. The existing conversation data is re-scored with the specified checks.

Arguments:

  • test_run_id - ID of the source test run to re-evaluate.
  • checks - List of check names or IDs to apply.
  • name - Optional name for the new re-evaluated run.
  • tags - Optional tags for the new run.

Returns:

The newly created TestRunItem with re-evaluated results.

download_call_recording

def download_call_recording(call_sid: str) -> bytes

Download a voice call recording by its Twilio CallSid.

Arguments:

  • call_sid - The Twilio CallSid from datapoint metadata (e.g. dp.model_metadata.additional_properties["call_sid"]).

Returns:

Raw WAV audio bytes.

download_voice

def download_voice(file_url: str) -> bytes

Download a voice file from Okareo.

Files are always stored as MP3 on the server.

Arguments:

Returns:

Raw MP3 audio bytes.

create_scenario_set_with_audio_files

def create_scenario_set_with_audio_files(
name: str,
data_list: List[Dict[str, str]],
project_id: Optional[Union[str, UUID]] = None) -> ScenarioSetResponse

Upload local audio files and create a scenario set in one call.

Each item's 'input' field should be a local file path to a WAV or MP3 file. The file is uploaded to Okareo (coerced to MP3), and 'input' is replaced with the returned file URL before creating the scenario.

Arguments:

  • name - Scenario set name.
  • data_list - List of dicts with 'input' (local file path) and 'result' (expected transcript string).
  • project_id - Optional project to associate files and scenario with.

Returns:

ScenarioSetResponse from the created scenario set.

ingest_conversations

def ingest_conversations(
project_id: Union[str, UUID],
conversations: List[Dict[str, Any]],
mut_id: Union[str, UUID, None] = None) -> Dict[str, Any]

Ingest voice conversations for monitoring.

Accepts one or more conversations from voice platforms (Retell, Twilio, VAPI, etc.) and enqueues them for async processing. Each conversation's turns will become Datapoint rows, and configured monitors will automatically match and run checks.

This is the monitoring path, not the simulation path. No ScenarioSets are created. The mut_id is optional - when omitted, datapoints are created without MUT association and rely entirely on monitor/filter group matching.

Arguments:

  • project_id - Okareo project ID.
  • conversations - List of conversation dictionaries, each containing:
    • source_platform (str): Platform source ('retell', 'twilio', 'vapi', 'elevenlabs', or 'custom')
    • call_id (str): Platform-specific call identifier
    • context_token (str, optional): Context token for correlation (defaults to call_id)
    • audio (dict, optional): Preferred audio shape with one of:
    • {&quot;type&quot;: &quot;url&quot;, &quot;url&quot;: &quot;https://...&quot;}
    • {&quot;type&quot;: &quot;voice_file_id&quot;, &quot;voice_file_id&quot;: &quot;uuid&quot;}
    • {&quot;type&quot;: &quot;inline_b64&quot;, &quot;inline_b64&quot;: &quot;...&quot;}
    • recording_url (str, optional): Legacy compatibility alias for audio URL
    • recording_bytes_b64 (str, optional): Legacy compatibility alias for inline base64 audio
    • transcript (list, optional): Pre-parsed transcript as list of turns with 'role' and 'content'
    • diarization (bool, optional): When transcript is absent, controls whether Okareo runs diarization + ASR. Defaults to True.
    • metadata (dict, optional): Platform-specific metadata
    • tags (list, optional): Tags for monitor matching
    • first_turn (str, optional): For audio-only diarization ('user' or 'assistant' spoke first, defaults to 'assistant')
  • mut_id - Optional model under test ID. If not provided, datapoints are created without MUT association (monitoring path).

Returns:

Dict with 'status' and list of conversation identifiers.

Raises:

  • httpx.HTTPStatusError - If the API returns an error status.

    Example (monitoring-only, no MUT):

okareo.ingest_conversations(
project_id="your-project-id",
conversations=[
{
"source_platform": "retell",
"call_id": "call-123",
"audio": {
"type": "url",
"url": "https://retell.ai/recordings/call-123.mp3",
},
"tags": ["support", "billing"],
"metadata": {"customer_id": "cust-456"}
}
]
)

Example (with MUT association):

okareo.ingest_conversations(
project_id="your-project-id",
mut_id="your-mut-id",
conversations=[
{
"source_platform": "custom",
"call_id": "call-456",
"transcript": [
{"role": "user", "content": "Hello", "timestamp_ms": 0},
{"role": "assistant", "content": "Hi, how can I help?", "timestamp_ms": 1000}
],
}
]
)