Creating Custom Checks
If the out-of-the-box checks do not serve your needs, then you can generate and upload your own Python-based checks.
Generating checks
To help you create your own checks, the Okareo SDK provides the generate_check
method. You can describe the logic of your check using natural language, and an LLM will generate an evaluate
method meeting those requirements.
For example, we can try to generate a check that looks for natural language below.
To help you create your own checks, the Okareo SDK provides the generate_check
method. You can describe the logic of your check using natural language, and an LLM will generate an evaluate
method meeting those requirements.
For example, we can try to generate a check that looks for natural language below.
- Typescript
- Python
const generated_check = await okareo.generate_check({
project_id,
name: "demo.summaryUnder256",
description: "Pass if model_output contains at least one line of natural language.",
output_data_type: "bool",
requires_scenario_input:true,
requires_scenario_result:true,
});
return await okareo.upload_check({
project_id,
...generated_check
} as UploadEvaluatorProps);
from okareo_api_client.models.evaluator_spec_request import EvaluatorSpecRequest
description = """
Return `False` if `model_output` contains at least one line of natural language.
Otherwise, return `True`.
"""
generate_request = EvaluatorSpecRequest(
description=description,
requires_scenario_input=False,
requires_scenario_result=False,
output_data_type="bool"
)
generated_test = okareo.generate_check(generate_request).generated_code
Please ensure that requires_scenario_input
and requires_scenario_result
are correctly configured for your check.
For example, if your check relies on the scenario_input
, then you should set requires_scenario_input=True
.
Uploading checks
Given a generated check, the Okareo SDK provides the upload_check
method, which allows you to run custom check
s in Okareo.
- Typescript
- Python
const upload_check: any = await okareo.upload_check({
name: 'Example Uploaded Check',
project_id,
description: "Pass if the model result length is within 10% of the expected result.",
requires_scenario_input: false,
requires_scenario_result: true,
output_data_type: "bool",
file_path: "tests/example_eval.py",
update: true
});
import tempfile
check_name = "has_no_natural_language"
temp_dir = tempfile.gettempdir()
file_path = os.path.join(temp_dir, f"{check_name}.py")
with open(file_path, "w+") as file:
file.write(generated_test)
has_no_nl_check = okareo.upload_check(
name=check_name,
file_path=file_path,
requires_scenario_input=False,
requires_scenario_result=False
)
Your evaluate
function must be saved locally as a .py
file, and the file_path
should point to this .py
file.
Evaluating with uploaded checks
Once the check
has been uploaded, you can use the check in a model_under_test.run_test
by adding the name or the ID of the check to your list of checks
. For example:
- Typescript
- Python
// provide a list of checks by name or ID
const eval_results: any = await model.run_test({
model_api_key: OPENAI_API_KEY,
name: 'Evaluation Name',
tags: ["Example", `Build:${UNIQUE_BUILD_ID}`],
project_id: project_id,
scenario_id: scenario_id,
calculate_metrics: true,
type: TestRunType.NL_GENERATION,
checks: [
"check_name_1",
"check_name_2",
...
],
} as RunTestProps);
checks = [check_name] # alternatively: has_no_nl_check.id
# assume that "scenario" is a ScenarioSetResponse object or a UUID
evaluation = model_under_test.run_test(
name="Evaluation Name",
scenario=scenario,
test_run_type=TestRunType.NL_GENERATION,
calculate_metrics=True,
checks=checks
)