Get Started with Function Calling Evaluation
Agents interact with the external world via function calling, i.e. generating valid API calls that are passed to an execution environment. With Okareo's function calling evaluations, you can measure the accuracy of your agent's function calls.
What do you need?
You will need an environment for running Okareo. Typescript and Python are both available. Please see the SDK sections for more on how to setup each.
Cookbook examples for this guide are available:
- Typescript Cookbook (Coming soon!)
Scenarios for Function Calling
In a function calling scenario, you will need to specify the expected function call in each scenario row's result
field. This should resemble the following:
{
"name": str, # the name of the function to be called
"parameter_definitions": {
"parameter_1": {
"value": ...,
"type": str | bool | int | float | dict,
"required": bool,
},
...
}
}
CustomModel
s for Function Calling
To use a function-call capable model in Okareo, you can define the invoke
method of a CustomModel
. The output of the model should be formatted as follows:
{
"tool_calls": [
{
"name": str, # the name of the called function
"parameters": {
"parameter_1": ..., # value of parameter_1
}
}
]
}
For an illustrative example of a CustomModel
that uses function calling, see the following snippet.
class FunctionCallModel(CustomModel):
def __init__(self, name):
super().__init__(name)
self.usernames = ["Bob", "Alice", "John"]
def invoke(self, input_value):
out = {"tool_calls": []}
tool_call = {"name": "unknown"}
# parse out the function name
if "delete" in input_value:
tool_call["name"] = "delete_account"
if "create" in input_value:
tool_call["name"] = "create_account"
# parse out the function parameter
tool_call["parameters"] = {}
for username in self.usernames:
if username in input_value:
tool_call["parameters"]["username"] = username
break
# package the tool call and return
out["tool_calls"].append(tool_call)
return ModelInvocation(
model_prediction=out,
model_input=input_value
)
Checks for Function Calling
For function call evaluations with reference answers, we recommend using the function_call_ast_validator
check. If no reference answers are available, we recommend the function_call_validator
.
For more check options, check out our code-based and judge-based function call checks.
Once you have picked or created checks that fit your needs, you can run these checks by calling run_test()
on your model with the following command:
model_under_test.run_test(
name="My Function Call Evaluation",
scenario=tool_scenario.scenario_id,
test_run_type=TestRunType.NL_GENERATION,
checks=[
"function_call_ast_validator",
"function_call_validator"
],
)