Skip to main content

Get Started with Function Calling Evaluation

Agents interact with the external world via function calling, i.e. generating valid API calls that are passed to an execution environment. With Okareo's function calling evaluations, you can measure the accuracy of your agent's function calls.

What do you need?

You will need an environment for running Okareo. Typescript and Python are both available. Please see the SDK sections for more on how to setup each.

Cookbook examples for this guide are available:

Scenarios for Function Calling

In a function calling scenario, you will need to specify the expected function call in each scenario row's result field. This should resemble the following:

{
"name": str, # the name of the function to be called
"parameter_definitions": {
"parameter_1": {
"value": ...,
"type": str | bool | int | float | dict,
"required": bool,
},
...
}
}

CustomModels for Function Calling

To use a function-call capable model in Okareo, you can define the invoke method of a CustomModel. The output of the model should be formatted as follows:

{
"tool_calls": [
{
"name": str, # the name of the called function
"parameters": {
"parameter_1": ..., # value of parameter_1
}
}
]
}

For an illustrative example of a CustomModel that uses function calling, see the following snippet.

class FunctionCallModel(CustomModel):
def __init__(self, name):
super().__init__(name)
self.usernames = ["Bob", "Alice", "John"]

def invoke(self, input_value):
out = {"tool_calls": []}
tool_call = {"name": "unknown"}

# parse out the function name
if "delete" in input_value:
tool_call["name"] = "delete_account"
if "create" in input_value:
tool_call["name"] = "create_account"

# parse out the function parameter
tool_call["parameters"] = {}
for username in self.usernames:
if username in input_value:
tool_call["parameters"]["username"] = username
break

# package the tool call and return
out["tool_calls"].append(tool_call)
return ModelInvocation(
model_prediction=out,
model_input=input_value
)

Checks for Function Calling

The following predefined checks are available to help you evaluate your function-calling agents.

  • is_function_correct: Checks if the generated function call in the model_output matches the expected function call in the scenario_result.
  • are_required_parameters_present: Checks if the generated parameters in the model_output contain the required parameters in the scenario_result.
  • are_all_parameters_expected: Checks if the generated parameter names in the model_output are expected based on the schema in the scenario_result.
  • do_parameter_values_match: Checks if each specified parameter value in the scenario_result matches the corresponding parameter value in the model_output.

You can run these checks by calling run_test() on your model with the following command:

model_under_test.run_test(
name="My Function Call Evaluation",
scenario=tool_scenario.scenario_id,
test_run_type=TestRunType.NL_GENERATION,
checks=[
"is_function_correct",
"are_required_params_present",
"are_all_params_expected",
"do_param_values_match",
],
)