Get Started with Function Calling Evaluation

Agents interact with the external world via function calling, i.e. generating valid API calls that are passed to an execution environment. With Okareo's function calling evaluations, you can measure the accuracy of your agent's function calls.

What do you need?

You will need an environment for running Okareo. Typescript and Python are both available. Please see the SDK sections for more on how to setup each.

Cookbook examples for this guide are available:

Colab Notebook (Command-R)
Colab Notebook (Generic)
Typescript Cookbook (Coming soon!)

Scenarios for Function Calling

In a function calling scenario, you will need to specify the expected function call in each scenario row's result field. This should resemble the following:

{
    "name": str, # the name of the function to be called
    "parameter_definitions": {
        "parameter_1": {
            "value": ...,
            "type": str | bool | int | float | dict,
            "required": bool,
        },
        ...
    }
}

`CustomModel`s for Function Calling

To use a function-call capable model in Okareo, you can define the invoke method of a CustomModel. The output of the model should be formatted as follows:

{
    "tool_calls": [
        {
            "name": str, # the name of the called function
            "parameters": {
                "parameter_1": ..., # value of parameter_1
            }
        }
    ]
}

For an illustrative example of a CustomModel that uses function calling, see the following snippet.

class FunctionCallModel(CustomModel):
    def __init__(self, name):
        super().__init__(name)
        self.usernames = ["Bob", "Alice", "John"]

    def invoke(self, input_value):
        out = {"tool_calls": []}
        tool_call = {"name": "unknown"}

        # parse out the function name
        if "delete" in input_value:
            tool_call["name"] = "delete_account"
        if "create" in input_value:
            tool_call["name"] = "create_account"

        # parse out the function parameter
        tool_call["parameters"] = {}
        for username in self.usernames:
            if username in input_value:
                tool_call["parameters"]["username"] = username
                break

        # package the tool call and return
        out["tool_calls"].append(tool_call)
        return ModelInvocation(
            model_prediction=out,
            model_input=input_value
        )

Checks for Function Calling

For function call evaluations with reference answers, we recommend using the function_call_ast_validator check. If no reference answers are available, we recommend the function_call_validator.

For more check options, check out our code-based and judge-based function call checks.

Once you have picked or created checks that fit your needs, you can run these checks by calling run_test() on your model with the following command:

model_under_test.run_test(
    name="My Function Call Evaluation",
    scenario=tool_scenario.scenario_id,
    test_run_type=TestRunType.NL_GENERATION,
    checks=[
        "function_call_ast_validator",
        "function_call_validator"
    ],
)

What do you need?​

Scenarios for Function Calling​

CustomModels for Function Calling​

Checks for Function Calling​

What do you need?

Scenarios for Function Calling

`CustomModel`s for Function Calling

Checks for Function Calling