Creating Custom Checks

If the out-of-the-box checks do not serve your needs, then you can generate and upload your own Python-based checks.

Generating checks

To help you create your own checks, the Okareo SDK provides the generate_check method. You can describe the logic of your check using natural language, and an LLM will generate an evaluate method meeting those requirements.

For example, we can try to generate a check that looks for natural language below. To help you create your own checks, the Okareo SDK provides the generate_check method. You can describe the logic of your check using natural language, and an LLM will generate an evaluate method meeting those requirements.

For example, we can try to generate a check that looks for natural language below.

Typescript
Python

const generated_check = await okareo.generate_check({  
    project_id,
    name: "demo.summaryUnder256",
    description: "Pass if model_output contains at least one line of natural language.",
    output_data_type: "bool",
    requires_scenario_input:true,
    requires_scenario_result:true,
});

return await okareo.upload_check({
    project_id,
    ...generated_check
} as UploadEvaluatorProps);

from okareo_api_client.models.evaluator_spec_request import EvaluatorSpecRequest

description = """
Return `False` if `model_output` contains at least one line of natural language. 
Otherwise, return `True`.
"""

generate_request = EvaluatorSpecRequest(
    description=description,
    requires_scenario_input=False,
    requires_scenario_result=False,
    output_data_type="bool"
)
generated_test = okareo.generate_check(generate_request).generated_code

note

Please ensure that requires_scenario_input and requires_scenario_result are correctly configured for your check.

For example, if your check relies on the scenario_input, then you should set requires_scenario_input=True.

Uploading checks

Given a generated check, the Okareo SDK provides the upload_check method, which allows you to run custom checks in Okareo.

Typescript
Python

const upload_check: any = await okareo.upload_check({
    name: 'Example Uploaded Check',
    project_id,
    description: "Pass if the model result length is within 10% of the expected result.",
    requires_scenario_input: false,
    requires_scenario_result: true,
    output_data_type: "bool",
    file_path: "tests/example_eval.py",
    update: true
});

import tempfile

check_name = "has_no_natural_language"
temp_dir = tempfile.gettempdir()
file_path = os.path.join(temp_dir, f"{check_name}.py")
with open(file_path, "w+") as file:
    file.write(generated_test)

has_no_nl_check = okareo.upload_check(
    name=check_name,
    file_path=file_path,
    requires_scenario_input=False,
    requires_scenario_result=False
)

note

Your evaluate function must be saved locally as a .py file, and the file_path should point to this .py file.

Evaluating with uploaded checks

Once the check has been uploaded, you can use the check in a model_under_test.run_test by adding the name or the ID of the check to your list of checks. For example:

Typescript
Python

// provide a list of checks by name or ID 
const eval_results: any = await model.run_test({
    model_api_key: OPENAI_API_KEY,
    name: 'Evaluation Name',
    tags: ["Example", `Build:${UNIQUE_BUILD_ID}`],
    project_id: project_id,
    scenario_id: scenario_id,
    calculate_metrics: true,
    type: TestRunType.NL_GENERATION,
    checks: [
        "check_name_1",
        "check_name_2",
        ...
    ],
} as RunTestProps);

checks = [check_name] # alternatively: has_no_nl_check.id

# assume that "scenario" is a ScenarioSetResponse object or a UUID
evaluation = model_under_test.run_test(
    name="Evaluation Name",
    scenario=scenario,
    test_run_type=TestRunType.NL_GENERATION,
    calculate_metrics=True,
    checks=checks
)

Generating checks​

Uploading checks​

Evaluating with uploaded checks​

Generating checks

Uploading checks

Evaluating with uploaded checks