Overview
What problem are we solving?
AI/ML is becoming increasingly common in software development. Deterministic code, which always produces the same output given the same input, is relatively easy to test. Non-deterministic software components, which can produce different outputs given the same input, require new approaches.
Manual testing and manually monitored production feedback loops can be used to improve models. But doing so is arduous, time consuming and risky.
Enter Okareo. Our focus is on helping you establish reliable AI throughout your development lifecycle.
Getting Started
When seeking reliability, it is not hard for model evaluation to get complicated, fast. But let's start with something simple to get a feel for Okareo.
Okareo Platform Basics
Okareo provides SDK, CLI, and Notebook support for Python and Typescript.
If your prefer Python Notebooks
, feel free to start with the exmaples below. This guide focuses on the CLI+SDK path useful for local development and CI.
Sign-Up and get your API Token
To use Okareo you will need an API Token, some data, and a model to test.
- If you haven't already, you can sign-up for a free account with full access.
- Provision your API Token from the
Home
page or fromSettings
>API Token
- We suggest making the token available in your environment as
OKAREO_API_KEY
export OKAREO_API_KEY="<YOUR_TOKEN>"
Install the Okareo CLI
Okareo includes a simple CLI script runner that you can use with Python, Typescript and even yaml config to drive evaluation. To get started quickly, we will provide simple Python and Typescript examples.
Download the latest version of the okareo CLI (okareo
) for your development environment.
- macOS Silicon
- Windows
- Linux
- All Releases
curl -O -L https://github.com/okareo-ai/okareo-cli/releases/download/v0.0.19/okareo_0.0.19_darwin_arm64.tar.gz
tar -xvf okareo_0.0.19_darwin_arm64.tar.gz
# Download and Extract the archive (requires tar in PowerShell 5.1+)
Invoke-WebRequest -Uri https://github.com/okareo-ai/okareo-cli/releases/download/v0.0.19/okareo_0.0.19_windows_386.tar.gz -OutFile okareo_0.0.19_windows_386.tar.gz
tar -xvf okareo_0.0.19_windows_386.tar.gz
curl -O -L https://github.com/okareo-ai/okareo-cli/releases/download/v0.0.19/okareo_0.0.19_linux_386.tar.gz
tar -xvf okareo_0.0.19_linux_386.tar.gz
You can find all the current releases at the Okareo-CLI github repo.
Add Okareo to your path after unpacking:
export PATH="$PATH:[LOCAL_PATH_WHERE_YOU_UNPACKED_OKAREO]/bin"
Run okareo -v
to verify your installation before moving to next step.
Let's Run an Evaluation!
Step 1: Create an Okareo Project
Okareo projects are language specific typescript
or python
okareo init --language typescript
The init
command creates a .okareo
folder with a config.yml
file and a folder called flows
. Evaluation and fine tuning flows you want to run from the CLI are placed in the flows folder and can be run individually or as a group.
Step 2: Create an Evaluation Flow
Everyone's AI/Model evaluation needs are different. We have provided some common examples you can build on.
Here we will evaluation a Function Call. The goal is to detemine if the "model" (in this case a simple code block) will correctly interpret the request and respond with a signature complete API response. In this mock example, failure could mean accidentally deleting a valid user.
- Typescript
- Python
Save the following script as function_eval.ts
in your .okareo/flows
folder created by the okareo init
command.
// Save this flow as function_eval.ts and place it in your .okareo/flows folder
import { Okareo, RunTestProps, TestRunType, CustomModel,} from "okareo-ts-sdk";
const main = async () => {
try {
const okareo = new Okareo({api_key:process.env.OKAREO_API_KEY});
const project_id = (await okareo.getProjects()).find(p => p.name === 'Global')?.id;
const seedData = [
{ input: "can you delete my account? my name is Bob", result: {name: "delete_account", parameter_definitions: { username: { value: "Bob", type: "str", required: true } } } },
{ input: "how do I make an account? I'm Alice", result: { name: "create_account", parameter_definitions: { username: { value: "Alice", type: "str", required: true } } } },
{ input: "how do I create an account?", result: { name: "create_account", parameter_definitions: { username: { value: "Alice", type: "str", required: true } } } },
{ input: "my name is John. how do I create a project?", result: { name: "create_account", parameter_definitions: { username: { value: "Alice", type: "str", required: true } } } }];
const scenario: any = await okareo.create_scenario_set({
name: `Function Call Demo Scenario - ${(Math.random() + 1).toString(36).substring(7)}`,
project_id: project_id,
seed_data: seedData
});
const function_call_model = {
type: 'custom',
invoke: async (input_value) => {
const usernames = ["Alice", "Bob", "Charlie"];
const out: { tool_calls: { name: string; parameters: { [key: string]: any } }[] } = { tool_calls: [] };
const tool_call: { name: string; parameters: { [key: string]: any } } = { name: "unknown", parameters: {} };
if (input_value.includes("delete")) {
tool_call.name = "delete_account";
}
if (input_value.includes("create")) {
tool_call.name = "create_account";
}
for (const username of usernames) {
if (input_value.includes(username)) {
tool_call.parameters["username"] = username;
break;
}
}
out.tool_calls.push(tool_call);
return {
model_prediction: out,
model_input: input_value,
model_output_metadata: {},
};
}
} as CustomModel;
const model = await okareo.register_model({
name: 'Function Call Demo Model',
project_id: project_id,
models: function_call_model,
update: true,
});
const eval_run: any = await model.run_test({
name: 'Function Call Demo Evaluation',
project_id: project_id,
scenario_id: scenario.scenario_id,
calculate_metrics: true,
type: TestRunType.NL_GENERATION,
checks: ["is_function_correct", "are_required_params_present", "are_all_params_expected", "do_param_values_match"],
} as RunTestProps);
console.log(`View the evaluation in the Okareo app: ${eval_run.app_link}`);
} catch (e) {
console.error(JSON.stringify(e, null, 2));
}
}
main();
Save the following script as function_eval.py
in your .okareo/flows
folder created by the okareo init
command.
# Save this flow as function_eval.py and place it in your .okareo/flows folder
from okareo import Okareo
import os
import random
import string
from okareo.model_under_test import CustomModel, ModelInvocation
from okareo_api_client.models.scenario_set_create import ScenarioSetCreate
from okareo_api_client.models.seed_data import SeedData
from okareo_api_client.models.test_run_type import TestRunType
OKAREO_API_KEY = os.environ.get("OKAREO_API_KEY")
okareo = Okareo(OKAREO_API_KEY)
def random_string(length: int) -> str:
return "".join(random.choices(string.ascii_letters, k=length))
seed_data = [
SeedData(
input_="can you delete my account? my name is Bob",
result={"name": "delete_account", "parameter_definitions": {"username": {"value": "Bob", "type": "str", "required": True}}},
),
SeedData(
input_="how do I make an account? I'm Alice",
result={"name": "create_account", "parameter_definitions": {"username": {"value": "Alice", "type": "str", "required": True}}},
),
SeedData(
input_="how do I create an account?",
result={"name": "create_account", "parameter_definitions": {"username": {"value": "Alice", "type": "str", "required": True}}},
),
SeedData(
input_="my name is John. how do I create a project?",
result={"name": "create_account", "parameter_definitions": {"username": {"value": "Alice", "type": "str", "required": True}}},
),
]
tool_scenario = okareo.create_scenario_set(
ScenarioSetCreate(
name=f"Function Call Demo Scenario - {random_string(5)}",
seed_data=seed_data,
)
)
class FunctionCallModel(CustomModel):
def __init__(self, name):
super().__init__(name)
self.usernames = ["Bob", "Alice", "John"]
def invoke(self, input_value):
out = {"tool_calls": []}
tool_call = {"name": "unknown"}
# parse out the function name
if "delete" in input_value:
tool_call["name"] = "delete_account"
if "create" in input_value:
tool_call["name"] = "create_account"
# parse out the function parameter
tool_call["parameters"] = {}
for username in self.usernames:
if username in input_value:
tool_call["parameters"]["username"] = username
break
# package the tool call and return
out["tool_calls"].append(tool_call)
return ModelInvocation(
model_prediction=out,
model_input=input_value,
)
# Register the model to use in the test run
mut_name="Function Call Demo Model"
model_under_test = okareo.register_model(
name=mut_name,
model=[FunctionCallModel(name=FunctionCallModel.__name__)],
update=True
)
eval_name = f"Function Call Demo Evaluation"
evaluation = model_under_test.run_test(
name=eval_name,
scenario=tool_scenario.scenario_id,
test_run_type=TestRunType.NL_GENERATION,
checks=[
"is_function_correct",
"are_required_params_present",
"are_all_params_expected",
"do_param_values_match",
],
)
print(f"See results in Okareo: {evaluation.app_link}")
Step 3: Run your first flow
Let's run your first Okareo flow. In this case we are using the -f
flag to just run the function_eval script. For a list of available comamnds, you can use okareo --help
.
okareo run -f function_eval
Step 4: What to do next
Now that we have a working flow that establishes a scenario, registers a model and runs an evaluation, you are ready to start building and evaluating your AI-native capabilities. Also, don't hesitate to give us feedback. Learning is golden.
Next Steps:- Synthetic Scenario Generation: Learn about synthetically creating a behavior map of positive and negative scenarios that you can use to establish baseline metrics for your models.
- Supported Models and Approaches: Okareo includes built-in support for a wide variety of model types and providers incuding custom models.
- Evaluation and Checks: Evaluations are driven through discrete
Checks
that provide specific metrics. Checks can be deterministic code or based on an AI judge. Learn more about built-in and custom checks for evaluation. - Fine Tuning: Assembling and organizing data sets for fine tuning is a native element of Okareo. Learn more from our Founding Data Scientist in this blog on bootstrapping fine tuning with Okareo.