Get Started with Function Calling Evaluation
Agents interact with the external world via function calling, i.e. generating valid API calls that are passed to an execution environment. With Okareo's function calling evaluations, you can measure the accuracy of your agent's function calls.
What do you need?
You will need an environment for running Okareo. Typescript and Python are both available. Please see the SDK sections for more on how to setup each.
Cookbook examples for this guide are available:
- Colab Notebook (Command-R)
- Colab Notebook (Generic)
- Typescript Cookbook (Coming soon!)
Scenarios for Function Calling
In a function calling scenario, you will need to specify the expected function call in each scenario row's result
field. This should resemble the following:
{
"name": str, # the name of the function to be called
"parameter_definitions": {
"parameter_1": {
"value": ...,
"type": str | bool | int | float | dict,
"required": bool,
},
...
}
}
CustomModel
s for Function Calling
To use a function-call capable model in Okareo, you can define the invoke
method of a CustomModel
. The output of the model should be formatted as follows:
{
"tool_calls": [
{
"name": str, # the name of the called function
"parameters": {
"parameter_1": ..., # value of parameter_1
}
}
]
}
For an illustrative example of a CustomModel
that uses function calling, see the following snippet.
class FunctionCallModel(CustomModel):
def __init__(self, name):
super().__init__(name)
self.usernames = ["Bob", "Alice", "John"]
def invoke(self, input_value):
out = {"tool_calls": []}
tool_call = {"name": "unknown"}
# parse out the function name
if "delete" in input_value:
tool_call["name"] = "delete_account"
if "create" in input_value:
tool_call["name"] = "create_account"
# parse out the function parameter
tool_call["parameters"] = {}
for username in self.usernames:
if username in input_value:
tool_call["parameters"]["username"] = username
break
# package the tool call and return
out["tool_calls"].append(tool_call)
return ModelInvocation(
model_prediction=out,
model_input=input_value
)
Checks for Function Calling
The following predefined checks are available to help you evaluate your function-calling agents.
is_function_correct
: Checks if the generated function call in the model_output matches the expected function call in the scenario_result.are_required_parameters_present
: Checks if the generated parameters in the model_output contain the required parameters in the scenario_result.are_all_parameters_expected
: Checks if the generated parameter names in the model_output are expected based on the schema in the scenario_result.do_parameter_values_match
: Checks if each specified parameter value in the scenario_result matches the corresponding parameter value in the model_output.
You can run these checks by calling run_test()
on your model with the following command:
model_under_test.run_test(
name="My Function Call Evaluation",
scenario=tool_scenario.scenario_id,
test_run_type=TestRunType.NL_GENERATION,
checks=[
"is_function_correct",
"are_required_params_present",
"are_all_params_expected",
"do_param_values_match",
],
)