Skip to main content

Creating Custom Checks

If the out-of-the-box checks do not serve your needs, then you can generate and upload your own Python-based checks.

Generating checks

To help you create your own checks, the Okareo SDK provides the generate_check method. You can describe the logic of your check using natural language, and an LLM will generate an evaluate method meeting those requirements.

For example, we can try to generate a check that looks for natural language below. To help you create your own checks, the Okareo SDK provides the generate_check method. You can describe the logic of your check using natural language, and an LLM will generate an evaluate method meeting those requirements.

For example, we can try to generate a check that looks for natural language below.

const generated_check = await okareo.generate_check({  
project_id,
name: "demo.summaryUnder256",
description: "Pass if model_output contains at least one line of natural language.",
output_data_type: "bool",
requires_scenario_input:true,
requires_scenario_result:true,
});

return await okareo.upload_check({
project_id,
...generated_check
} as UploadEvaluatorProps);

note

Please ensure that requires_scenario_input and requires_scenario_result are correctly configured for your check.

For example, if your check relies on the scenario_input, then you should set requires_scenario_input=True.

Uploading checks

Given a generated check, the Okareo SDK provides the upload_check method, which allows you to run custom checks in Okareo.

const upload_check: any = await okareo.upload_check({
name: 'Example Uploaded Check',
project_id,
description: "Pass if the model result length is within 10% of the expected result.",
requires_scenario_input: false,
requires_scenario_result: true,
output_data_type: "bool",
file_path: "tests/example_eval.py",
update: true
});
note

Your evaluate function must be saved locally as a .py file, and the file_path should point to this .py file.

Evaluating with uploaded checks

Once the check has been uploaded, you can use the check in a model_under_test.run_test by adding the name or the ID of the check to your list of checks. For example:

// provide a list of checks by name or ID 
const eval_results: any = await model.run_test({
model_api_key: OPENAI_API_KEY,
name: 'Evaluation Name',
tags: ["Example", `Build:${UNIQUE_BUILD_ID}`],
project_id: project_id,
scenario_id: scenario_id,
calculate_metrics: true,
type: TestRunType.NL_GENERATION,
checks: [
"check_name_1",
"check_name_2",
...
],
} as RunTestProps);