Creating Custom Checks
If the predefined checks do not serve your needs, you can create your own. Okareo supports two types of custom checks:
- Code-based checks — Python code with an
evaluatemethod. See Code Checks for the full contract (allowed parameters, return types, allowed imports). - Model checks -- A prompt template evaluated by a judge LLM. See Model Checks for template variables and prompt structure guidance.
Custom Code Checks
Generating code checks
The Okareo SDK provides a generate_check method that uses an LLM to generate an evaluate method from a natural-language description.
- Typescript
- Python
const generated_check = await okareo.generate_check({
project_id,
name: "demo.summaryUnder256",
description: "Pass if model_output contains at least one line of natural language.",
output_data_type: "bool",
requires_scenario_input:true,
requires_scenario_result:true,
});
return await okareo.upload_check({
project_id,
...generated_check
} as UploadEvaluatorProps);
from okareo_api_client.models.evaluator_spec_request import EvaluatorSpecRequest
description = """
Return False if model_output contains at least one line of natural language.
Otherwise, return True.
"""
generate_request = EvaluatorSpecRequest(
description=description,
requires_scenario_input=False,
requires_scenario_result=False,
output_data_type="bool"
)
generated_test = okareo.generate_check(generate_request).generated_code
Ensure that requires_scenario_input and requires_scenario_result are correctly configured for your check. If your check relies on the scenario_input, set requires_scenario_input=True. Similarly for scenario_result.
Uploading code checks
Given generated (or hand-written) check code, the Okareo SDK provides the upload_check method to register it.
- Typescript
- Python
const upload_check: any = await okareo.upload_check({
name: 'Example Uploaded Check',
project_id,
description: "Pass if the model result length is within 10% of the expected result.",
requires_scenario_input: false,
requires_scenario_result: true,
output_data_type: "bool",
file_path: "tests/example_eval.py",
update: true
});
import tempfile
check_name = "has_no_natural_language"
temp_dir = tempfile.gettempdir()
file_path = os.path.join(temp_dir, f"{check_name}.py")
with open(file_path, "w+") as file:
file.write(generated_test)
has_no_nl_check = okareo.upload_check(
name=check_name,
file_path=file_path,
requires_scenario_input=False,
requires_scenario_result=False
)
Your evaluate function must be saved locally as a .py file, and the file_path should point to this .py file.
For the full code check contract — allowed parameters, return types, allowed imports, and restrictions — see Code Checks.
Custom Model Checks
You can also create custom model checks by providing a prompt template. The prompt template is a set of instructions for a judge LLM that evaluates model output at runtime.
Creating a model check via the SDK
- Typescript
- Python
const check = await okareo.create_or_update_check({
name: "custom_coherence_check",
description: "Rate the coherence of the model output on a 1-5 scale",
check: {
type: "model",
prompt_template: `You will be given a Model Output.
Rate the output on one metric: Coherence (1-5).
Evaluation Criteria:
Coherence (1-5) - how well-structured and logically organized the output is.
Evaluation Steps:
1. Read the Model Output carefully.
2. Assess whether ideas flow logically and are well-organized.
3. Assign a Coherence score from 1 to 5.
Model Output:
{generation}
Evaluation Form (scores ONLY, one number):
- Coherence (1-5):`,
check_type: CheckOutputType.SCORE,
},
});
check = okareo.create_or_update_check(
name="custom_coherence_check",
description="Rate the coherence of the model output on a 1-5 scale",
check=ModelBasedCheck(
prompt_template="""You will be given a Model Output.
Rate the output on one metric: Coherence (1-5).
Evaluation Criteria:
Coherence (1-5) - how well-structured and logically organized the output is.
Evaluation Steps:
1. Read the Model Output carefully.
2. Assess whether ideas flow logically and are well-organized.
3. Assign a Coherence score from 1 to 5.
Model Output:
{generation}
Evaluation Form (scores ONLY, one number):
- Coherence (1-5):""",
check_type=CheckOutputType.SCORE,
),
)
For the full template variables list, prompt structure guidance, and output format details, see Model Checks.
Running custom checks
Once a custom check has been created (code-based or model-based), you can use it in an evaluation by adding the check's name or ID to your list of checks:
- Typescript
- Python
// provide a list of checks by name or ID
const eval_results: any = await model.run_test({
model_api_key: OPENAI_API_KEY,
name: 'Evaluation Name',
tags: ["Example", `Build:${UNIQUE_BUILD_ID}`],
project_id: project_id,
scenario_id: scenario_id,
calculate_metrics: true,
type: TestRunType.NL_GENERATION,
checks: [
"check_name_1",
"check_name_2",
...
],
} as RunTestProps);
checks = [check_name] # alternatively: has_no_nl_check.id
# assume that "scenario" is a ScenarioSetResponse object or a UUID
evaluation = model_under_test.run_test(
name="Evaluation Name",
scenario=scenario,
test_run_type=TestRunType.NL_GENERATION,
calculate_metrics=True,
checks=checks
)