Skip to main content

Code Checks

Code Checks

A Code Check uses Python code to evaluate the generated response. This is useful when you need more complex logic or want to incorporate domain-specific knowledge into your check.

Custom Code Checks

To use a custom Code Check:

  1. Create a new Python file (not in a notebook).
  2. In this file, define a class named 'Check' that inherits from CodeBasedCheck.
  3. Implement the evaluate method in your Check class.
  4. Include any additional code used by your check in the same file.

Here's an example:

# In my_custom_check.py
from okareo.checks import CodeBasedCheck

class Check(CodeBasedCheck):
@staticmethod
def evaluate(
model_output: str, scenario_input: str, scenario_result: str
) -> Union[bool, int, float]:
# Your evaluation logic here
word_count = len(model_output.split())
return word_count > 10 # Returns True if output has more than 10 words

The evaluate method should accept model_output, scenario_input, and scenario_result as arguments and return either a boolean, integer, or float.

Then, you can create or update the check using:

check_sample_code = okareo.create_or_update_check(
name="check_sample_code",
description="Check if output has more than 10 words",
check=Check(),
)

Okareo Code Checks

In Okareo, we provide out-of-the-box checks to assess your LLM's performance. In the Okareo SDK, you can list the available checks by running the following method:

okareo.get_all_checks()

To use any of these checks, you simply specify them when running an evaluation as follows:

checks = ['check_name_1', 'check_name_2', ..., 'check_name_N',]

// assume that "scenario" is a ScenarioSetResponse object or a UUID
const eval_results: any = await model.run_test({
model_api_key: OPENAI_API_KEY,
name: 'Evaluation Name',
tags: ["Example", `Build:${UNIQUE_BUILD_ID}`],
project_id: project_id,
scenario_id: scenario_id,
calculate_metrics: true,
type: TestRunType.NL_GENERATION,
checks: checks,
} as RunTestProps);

As of now, the following out-of-the box code checks are available in Okareo:

  • does_code_compile
  • contains_all_imports
  • compression_ratio
  • levenshtein_distance/levenshtein_distance_input
  • function_call_ast_validator

Natural Language checks

Compression Ratio

Name: compression_ratio.

The compression ratio is a measure of how much smaller (or larger) a generated text is compared with a scenario input. In Okareo, requesting the compression_ratio check will invoke the following evaluate method:

class Check(BaseCheck):
@staticmethod
def evaluate(model_output: str, scenario_input: str) -> float:
return len(model_output) / len(scenario_input)

Levenshtein Distance

Names: levenshtein_distance, levenshtein_distance_input.

The Levenshtein distance measures the amount of edits made to a given string where an "edit" can mean either an addition, a deletion, or a substitution. In Okareo, requesting the levenshtein_distance check will invoke the following evaluate method:

class Check(BaseCheck):
@staticmethod
def evaluate(model_output: str, scenario_response: str):
# use Levenshtein distance with uniform weights
weights = [1, 1, 1]
return levenshtein_distance(model_output, scenario_response, weights)

def levenshtein_distance(s1, s2, weights):
if len(s1) < len(s2):
return levenshtein_distance(s2, s1, weights)

if len(s2) == 0:
return len(s1)

previous_row = range(len(s2) + 1)
for i, c1 in enumerate(s1):
current_row = [i + 1]
for j, c2 in enumerate(s2):
insertions = previous_row[j + 1] + weights[0]
deletions = current_row[j] + weights[1]
substitutions = previous_row[j] + (c1 != c2) * weights[2]
current_row.append(min(insertions, deletions, substitutions))
previous_row = current_row

return previous_row[-1]

Similarly, the levenshtein_distance_input call will use the following evaluate method:


class Check(BaseCheck):
@staticmethod
def evaluate(model_output: str, scenario_input: str):
# use Levenshtein distance with uniform weights
weights = [1, 1, 1]
return levenshtein_distance(model_output, scenario_input, weights)

Function Call Checks

The following checks are used to validate LLMs/agents that generate function calls.

Function Call AST Validator

Name: function_call_ast_validator.

Validates function calls using the simple AST checker from the Berkeley Function Call Leaderboard repo. The tool call in the model output is compared against the expected structure defined in the scenario result.

Function Call Reference Validator

Name: function_call_reference_validator

Validates function calls by comparing the structure and content of tool calls in the model output against the expected structure defined in the scenario result. It ensures that all required parameters are present and match any specified patterns, supporting nested structures and regex matching for string values.

Do Param Values Match

Name: do_param_values_match

Validates function calls by comparing the structure and content of tool calls in the model output against the expected structure defined in the scenario result. It ensures that all required parameters are present and match any specified patterns, supporting nested structures and regex matching for string values.

Are All Params expected

Name: are_all_params_expected

Checks if the generated argument names in the function call are expected based on the schema in the scenario_result. A check to ensure that the generated arguments are not hallucinated.

Are Required Params present

Name: are_required_params_present

Checks if the generated arguments in the function call contain the required arguments in the scenario_result.

Is Function Correct

Name: is_function_correct

Checks if the generated function call name(s) in the tool_call matches the expected function call name(s) in the scenario_result.

Code Generation checks

Does Code Compile

Name: does_code_compile.

This check simply checks whether the generated Python code compiles. This check lets you tell whether the generated code contains any non-Pythonic content (e.g., natural language, HTML, etc.). Requesting the does_code_compile check will run the following evaluate method:

class Check(BaseCheck):
@staticmethod
def evaluate(model_output: str) -> bool:
try:
compile(model_output, '<string>', 'exec')
return True
except SyntaxError as e:
return False

Code Contains All Imports

Name: contains_all_imports.

This check looks at all the object/function calls in the generated code and ensures that the corresponding import statements are included.