Skip to main content

okareo.model_under_test

ModelUnderTest Objects

class ModelUnderTest(AsyncProcessorMixin)

A class for managing a Model Under Test (MUT) in Okareo. Returned by okareo.register_model()

submit_test

def submit_test(
scenario: Union[ScenarioSetResponse, str],
name: str,
api_key: Optional[str] = None,
api_keys: Optional[dict] = None,
metrics_kwargs: Optional[dict] = None,
test_run_type: TestRunType = TestRunType.MULTI_CLASS_CLASSIFICATION,
calculate_metrics: bool = True,
checks: Optional[List[str]] = None,
simulation_params: Optional[Any] = None,
driver_id: Optional[str] = None) -> TestRunItem

Asynchronous server-based version of test-run execution. For CustomModels, model invocations are handled client-side then evaluated server-side asynchronously. For other models, model invocations and evaluations handled server-side asynchronously.

Arguments:

  • scenario Union[ScenarioSetResponse, str] - The scenario set or identifier to use for the test run.
  • name str - The name to assign to the test run.
  • api_key Optional[str] - Optional API key for authentication.
  • api_keys Optional[dict] - Optional dictionary of API keys for different services.
  • metrics_kwargs Optional[dict] - Optional dictionary of keyword arguments for metrics calculation.
  • test_run_type TestRunType - The type of test run to execute. Defaults to MULTI_CLASS_CLASSIFICATION.
  • calculate_metrics bool - Whether to calculate metrics after the test run. Defaults to True.
  • checks Optional[List[str]] - Optional list of checks to perform during the test run.

Returns:

  • TestRunItem - The resulting test run item for the submitted test run. The id field can be used to retrieve the test run.

run_test

def run_test(
scenario: Union[ScenarioSetResponse, str],
name: str,
api_key: Optional[str] = None,
api_keys: Optional[dict] = None,
metrics_kwargs: Optional[dict] = None,
test_run_type: TestRunType = TestRunType.MULTI_CLASS_CLASSIFICATION,
calculate_metrics: bool = True,
checks: Optional[List[str]] = None,
simulation_params: Optional[Any] = None,
driver_id: Optional[str] = None) -> TestRunItem

Server-based version of test-run execution. For CustomModels, model invocations are handled client-side then evaluated server-side. For other models, model invocations and evaluations handled server-side.

Arguments:

  • scenario Union[ScenarioSetResponse, str] - The scenario set or identifier to use for the test run.
  • name str - The name to assign to the test run.
  • api_key Optional[str] - Optional API key for authentication.
  • api_keys Optional[dict] - Optional dictionary of API keys for different services.
  • metrics_kwargs Optional[dict] - Optional dictionary of keyword arguments for metrics calculation.
  • test_run_type TestRunType - The type of test run to execute. Defaults to MULTI_CLASS_CLASSIFICATION.
  • calculate_metrics bool - Whether to calculate metrics after the test run. Defaults to True.
  • checks Optional[List[str]] - Optional list of checks to perform during the test run.

Returns:

  • TestRunItem - The resulting test run item for the completed test run.

get_test_run

def get_test_run(test_run_id: str) -> TestRunItem

Retrieve a test run by its ID.

Arguments:

  • test_run_id str - The ID of the test run to retrieve.

Returns:

  • TestRunItem - The test run item corresponding to the provided ID.

ModelInvocation Objects

@_attrs_define
class ModelInvocation()

Model invocation response object returned from a CustomModel.invoke method or as an element of a list returned from a CustomBatchModel.invoke_batch method.

Arguments:

  • model_prediction - Prediction from the model to be used when running the evaluation, e.g. predicted class from classification model or generated text completion from a generative model. This would typically be parsed out of the overall model_output_metadata.
  • model_input - All the input sent to the model.
  • model_output_metadata - Full model response, including any metadata returned with model's output.
  • tool_calls - List of tool calls made during the model invocation, if any.

OpenAIModel Objects

@define
class OpenAIModel(BaseModel)

An OpenAI model definition with prompt template and relevant parameters for an Okareo evaluation.

Arguments:

  • model_id - Model ID to request from OpenAI completion. For list of available models, see https://platform.openai.com/docs/models
  • temperature - Parameter for controlling the randomness of the model's output.
  • system_prompt_template - System role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}.
  • user_prompt_template - User role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}
  • dialog_template - Dialog template in OpenAI message format to pass to the model. Uses mustache syntax for variable substitution.
  • tools - List of tools to pass to the model.

GenerationModel Objects

@define
class GenerationModel(BaseModel)

An LLM definition with prompt template and relevant parameters for an Okareo evaluation.

Arguments:

  • model_id - Model ID to request for LLM completion.
  • temperature - Parameter for controlling the randomness of the model's output.
  • system_prompt_template - System role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}.
  • user_prompt_template - User role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}
  • dialog_template - Dialog template in OpenAI message format to pass to the model. Uses mustache syntax for variable substitution.
  • tools - List of tools to pass to the model.

OpenAIAssistantModel Objects

@_attrs_define
class OpenAIAssistantModel(BaseModel)

An OpenAI Assistant definition with prompt template and relevant parameters for an Okareo evaluation.

Arguments:

  • model_id - Assistant ID to request to run a thread against.
  • assistant_prompt_template - System role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}.
  • user_prompt_template - User role prompt template to pass to the model. Uses mustache syntax for variable substitution, e.g. {scenario_input}
  • dialog_template - Dialog template in OpenAI message format to pass to the model. Uses mustache syntax for variable substitution.

CohereModel Objects

@_attrs_define
class CohereModel(BaseModel)

A Cohere model definition with prompt template and relevant parameters for an Okareo evaluation.

Arguments:

PineconeDb Objects

@_attrs_define
class PineconeDb(BaseModel)

A Pinecone vector database configuration for use in an Okareo retrieval evaluation.

Arguments:

  • index_name - The name of the Pinecone index to connect to.
  • region - The region where the Pinecone index is hosted.
  • project_id - The project identifier associated with the Pinecone index.
  • top_k - The number of top results to retrieve for queries. Defaults to 5.

QdrantDB Objects

@_attrs_define
class QdrantDB(BaseModel)

A Qdrant vector database configuration for use in an Okareo retrieval evaluation.

Arguments:

  • collection_name - The name of the Qdrant collection to connect to.
  • url - The URL of the Qdrant instance.
  • top_k - The number of top results to retrieve for queries. Defaults to 5.
  • sparse - Whether to use sparse vectors for the Qdrant collection. Defaults to False.

CustomModel Objects

@_attrs_define
class CustomModel(BaseModel)

A custom model definition for an Okareo evaluation. Requires a valid invoke definition that operates on a single input.

Arguments:

  • name - A name for the custom model.

invoke

@abstractmethod
def invoke(input_value: Union[dict, list, str]) -> Union[ModelInvocation, Any]

Method for taking a single scenario input and returning a single model output

Arguments:

  • input_value - Union[dict, list, str] - input to the model.

Returns:

Union[ModelInvocation, Any] - model output. If the model returns a ModelInvocation, it should contain the model's prediction, input, and metadata. If the model returns a tuple, the first element should be the model's prediction and the second element should be the metadata.

CustomMultiturnTarget Objects

@_attrs_define
class CustomMultiturnTarget(BaseModel)

A custom model definition for an Okareo multiturn evaluation. Requires a valid invoke definition that operates on a single turn of a converstation.

start_session

def start_session(
scenario_input: str | None = None
) -> tuple[str | None, ModelInvocation | None]

Method for starting a multiturn conversation with a custom model

Returns:

  • str | None: session_id - the ID of the session started by the model.
  • ModelInvocation | None: model output - the model's response to the session start, if any.

end_session

def end_session(session_id: str) -> None

Method for ending a multiturn conversation with a custom model

Arguments:

  • session_id - str - the ID of the session to end.

invoke

@abstractmethod
def invoke(messages: List[dict[str, str]],
scenario_input: Optional[Union[dict, list, str]] = None,
session_id: Optional[str] = None) -> Union[ModelInvocation, Any]

Method for continuing a multiturn conversation with a custom model

Arguments:

  • messages - list - list of messages in the conversation
  • scenario_input - Optional[dict | list | str] - scenario input for the conversation

Returns:

Union[ModelInvocation, Any] - model output. If the model returns a ModelInvocation, it should contain the model's prediction, input, and metadata. If the model returns a tuple, the first element should be the model's prediction and the second element should be the metadata.

CustomMultiturnTargetAsync Objects

@_attrs_define
class CustomMultiturnTargetAsync(BaseModel)

A custom model definition for an Okareo multiturn evaluation that uses asynchronous methods. Requires a valid invoke definition that operates on a single turn of a converstation.

start_session

async def start_session(
scenario_input: str | None = None
) -> tuple[str | None, ModelInvocation | None]

Method for starting a multiturn conversation with a custom model

Returns:

  • str | None: session_id - the ID of the session started by the model.
  • ModelInvocation | None: model output - the model's response to the session start, if any.

end_session

async def end_session(session_id: str) -> None

Method for ending a multiturn conversation with a custom model

Arguments:

  • session_id - str - the ID of the session to end.

invoke

@abstractmethod
async def invoke(
messages: List[dict[str, str]],
scenario_input: Optional[Union[dict, list, str]] = None,
session_id: Optional[str] = None
) -> Awaitable[Union[ModelInvocation, Any]]

Method for continuing a multiturn conversation with a custom model

Arguments:

  • messages - list - list of messages in the conversation
  • scenario_input - Optional[dict | list | str] - scenario input for the conversation

Returns:

Union[ModelInvocation, Any] - model output. If the model returns a ModelInvocation, it should contain the model's prediction, input, and metadata. If the model returns a tuple, the first element should be the model's prediction and the second element should be the metadata.

VoiceTarget Objects

class VoiceTarget(BaseModel)

Base class for realtime voice targets in Okareo multiturn evaluation.

This target runs server-side during multiturn evaluations and follows the same API key pattern as other targets: API keys are passed via the api_keys parameter in run_simulation().

Notes:

The LocalVoiceTarget class in okareo.voice module is a different implementation that runs locally in the SDK. VoiceTarget subclasses run server-side and are recommended for production evaluations as they provide better scalability and monitoring capabilities.

OpenAIVoiceTarget Objects

@_attrs_define
class OpenAIVoiceTarget(VoiceTarget)

OpenAI Realtime API voice target for Okareo multiturn evaluation.

Arguments:

  • model - Model ID for OpenAI Realtime. Default: "gpt-realtime".
  • instructions - System instructions for the voice agent. Default: "Be brief and helpful."
  • output_voice - Voice ID for TTS output. Options: "alloy", "echo", "fable", "onyx", "nova", "shimmer".

DeepgramVoiceTarget Objects

@_attrs_define
class DeepgramVoiceTarget(VoiceTarget)

Deepgram voice target for Okareo multiturn evaluation.

Arguments:

  • model - Model ID for Deepgram. Default: "aura-2".
  • instructions - System instructions for the voice agent. Default: "Be brief and helpful."
  • output_voice - Voice ID for TTS output. Example: "aura-2-thalia-en".

TwilioVoiceTarget Objects

@_attrs_define
class TwilioVoiceTarget(VoiceTarget)

Twilio voice target for Okareo multiturn evaluation.

Arguments:

  • account_sid - Twilio account SID for authentication.
  • auth_token - Twilio authentication token.
  • from_phone_number - Phone number to call from (Twilio number).
  • to_phone_number - Phone number to call to (destination number).
  • max_parallel_requests - Maximum number of parallel requests the target can handle.

StopConfig Objects

@define
class StopConfig()

Configuration for stopping a multiturn conversation based on a specific check.

Arguments:

  • check_name - Name of the check to use for stopping the conversation.
  • stop_on - The check condition to stop the conversation. Defaults to True (i.e., conversation stops when check evaluates to True).

SessionConfig Objects

class SessionConfig()

Configuration for a custom API endpoint that starts a session.

Arguments:

  • url - URL of the endpoint to start the session.
  • method - HTTP method to use for the request. Defaults to POST.
  • headers - Headers to include in the request. Defaults to an empty JSON object.
  • body - Body to include in the request. Defaults to an empty JSON object.
  • status_code - Expected HTTP status code of the response.
  • response_session_id_path - Path to extract the session ID from the response. E.g., response.id will use the id field of the response JSON object to set the session_id.

TurnConfig Objects

class TurnConfig()

Configuration for a custom API endpoint that continues a session/conversation by one turn.

Arguments:

  • url - URL of the endpoint to start the session.
  • method - HTTP method to use for the request. Defaults to POST.
  • headers - Headers to include in the request. Supports mustache syntax for variable substitution for {latest_message}, {message_history}, {session_id}. Defaults to an empty JSON object.
  • body - Body to include in the request. Supports mustache syntax for variable substitution for {latest_message}, {message_history}, {session_id}. Defaults to an empty JSON object.
  • method1 - Expected HTTP status code of the response.
  • method2 - Path to extract the model's generated message from the response. E.g., method3 will parse out the corresponding field of the response JSON object as the model's generated response.
  • method4 - Path to extract tool calls from the response.

EndSessionConfig Objects

class EndSessionConfig()

Configuration for a custom API endpoint that ends a session.

Arguments:

  • url - URL of the endpoint to start the session.
  • method - HTTP method to use for the request. Defaults to POST.
  • headers - Headers to include in the request. Defaults to an empty JSON object.
  • body - Body to include in the request. Defaults to an empty JSON object.
  • status_code - Expected HTTP status code of the response.
  • response_session_id_path - Path to extract the session ID from the response.

CustomEndpointTarget Objects

class CustomEndpointTarget(BaseModel)

A trio of custom API endpoints for starting a session and continuing a conversation to use in Okareo multiturn evaluation.

Arguments:

  • start_session - A valid SessionConfig for starting a session.
  • next_turn - A valid TurnConfig for requesting and parsing the next turn of a conversation.
  • end_session - A valid EndSessionConfig for ending a session.
  • max_parallel_requests - Maximum number of parallel requests to allow when running the evaluation.

MultiTurnDriver Objects

@_attrs_define
class MultiTurnDriver(BaseModel)

A driver model for Okareo multiturn evaluation.

Arguments:

  • target - Target model under test to use in the multiturn evaluation.
  • stop_check - A valid StopConfig or a dict that can be converted to StopConfig.
  • driver_model_id - Model ID to use for the driver model (e.g., "gpt-4.1").
  • driver_temperature - Parameter for controlling the randomness of the driver model's output.
  • repeats - Number of times to run a conversation per scenario row. Defaults to 1.
  • max_turns - Maximum number of turns to run in a conversation. Defaults to 5.
  • first_turn - Name of model (i.e., "target" or "driver") that should initiate each conversation. Defaults to "target".
  • driver_prompt_template - Optional system prompt template to pass to the driver model. Uses mustache syntax for variable substitution, e.g. {input}.

CustomBatchModel Objects

@_attrs_define
class CustomBatchModel(BaseModel)

A custom batch model definition for an Okareo evaluation. Requires a valid invoke_batch definition that operates on a single input.

invoke_batch

@abstractmethod
def invoke_batch(
input_batch: list[dict[str, Union[dict, list, str]]]
) -> list[dict[str, Union[ModelInvocation, Any]]]

Method for taking a batch of scenario inputs and returning a corresponding batch of model outputs

Arguments:

  • input_batch - list[dict[str, Union[dict, list, str]]] - batch of inputs to the model. Expects a list of dicts of the format { 'id': str, 'input_value': Union[dict, list, str] }.

Returns:

List of dicts of format { 'id': str, 'model_invocation': Union[ModelInvocation, Any] }. 'id' must match the corresponding input_batch element's 'id'.