Skip to main content

Get Started with Function Calling Evaluation

Agents interact with the external world via function calling, i.e. generating valid API calls that are passed to an execution environment. With Okareo's function calling evaluations, you can measure the accuracy of your agent's function calls.

What do you need?

You will need an environment for running Okareo. Typescript and Python are both available. Please see the SDK sections for more on how to setup each.

Cookbook examples for this guide are available:

Scenarios for Function Calling

In a function calling scenario, you will need to specify the expected function call in each scenario row's result field. This should resemble the following:

{
"name": str, # the name of the function to be called
"parameter_definitions": {
"parameter_1": {
"value": ...,
"type": str | bool | int | float | dict,
"required": bool,
},
...
}
}

CustomModels for Function Calling

To use a function-call capable model in Okareo, you can define the invoke method of a CustomModel. The output of the model should be formatted as follows:

{
"tool_calls": [
{
"name": str, # the name of the called function
"parameters": {
"parameter_1": ..., # value of parameter_1
}
}
]
}

For an illustrative example of a CustomModel that uses function calling, see the following snippet.

class FunctionCallModel(CustomModel):
def __init__(self, name):
super().__init__(name)
self.usernames = ["Bob", "Alice", "John"]

def invoke(self, input_value):
out = {"tool_calls": []}
tool_call = {"name": "unknown"}

# parse out the function name
if "delete" in input_value:
tool_call["name"] = "delete_account"
if "create" in input_value:
tool_call["name"] = "create_account"

# parse out the function parameter
tool_call["parameters"] = {}
for username in self.usernames:
if username in input_value:
tool_call["parameters"]["username"] = username
break

# package the tool call and return
out["tool_calls"].append(tool_call)
return ModelInvocation(
model_prediction=out,
model_input=input_value
)

Checks for Function Calling

For function call evaluations with reference answers, we recommend using the function_call_ast_validator check. If no reference answers are available, we recommend the function_call_validator.

For more check options, check out our code-based and judge-based function call checks.

Once you have picked or created checks that fit your needs, you can run these checks by calling run_test() on your model with the following command:

model_under_test.run_test(
name="My Function Call Evaluation",
scenario=tool_scenario.scenario_id,
test_run_type=TestRunType.NL_GENERATION,
checks=[
"function_call_ast_validator",
"function_call_validator"
],
)